节点文献

数据挖掘中属性约简及分类算法研究

Research of Algorithms of Attribute Reduction and Classification in Data Mining

【作者】 李永华

【导师】 蒋芸;

【作者基本信息】 西北师范大学 , 计算机应用技术, 2009, 硕士

【摘要】 数据挖掘是指从数据库中抽取隐含的、具有潜在使用价值信息的过程,是一种新型的数据分析技术,己经被广泛应用于金融、保险、政府、教育、运输以及国防等领域。粗糙集理论是波兰数学家Z.Pawlak于1982年提出的一种新的处理模糊和不确定性知识的数学工具。本文结合粗糙集理论着重探讨了数据挖掘中属性约简与分类这两个核心问题。以信息系统为研究对象,通过研究完备信息系统下经典粗糙集模型的属性约简算法理论和方法,并指出了其中存在的不足,提出了一种基于粗糙集的改进的属性约简算法;对传统的决策树算法通过实例分析,指出算法中存在的问题,提出了一种传统的决策树算法的改进算法——基于属性加权平均重要性的决策树构造算法WMAS。本文主要工作及创新点如下:1.在对各种属性约简启发式算法中属性重要性研究基础上,提出了属性加权平均重要性的概念,该重要性综合考虑了属性对决策分类的重要性和在属性中的重要性。2.如何高效的实现粗糙集的属性约简,一直是粗糙集理论研究的重要内容。理论已经证明,搜索粗糙集属性约简的最优解是一个NP问题,因此,目前的研究已集中于如何求得属性约简的次优解上。本文先讨论了经典粗糙集的约简算法,在此基础上提出了一种基于粗糙集的属性约简改进算法,该算法在属性约简中不仅考虑到属性的重要性而且考虑了属性的信息量,能够得到信息系统的一个约简,且不需要求核,减少计算量,提高计算速度。3.通过对基于信息熵的决策树构造算法的研究得出,该方法存在的主要问题是一棵决策树中子数的重复,以及有些属性在一棵决策树中的某一路径上被多次检验,本文将属性加权平均重要性用于选择分离属性来构造决策树,且实现了基于属性加权平均重要性的决策树构造算法WMAS,该方法可以克服上述缺点,降低了复杂度,提高了分类精度。本文通过实例和实验对提出的算法进行了验证和证明。

【Abstract】 Data Mining means the process of extracting cryptic and potential helpful information from a mass of Data. It is one kind of brand new Data analysis technology and popular in the filed of banking finance, insurance, government, education, transportation and national defense etc. The theory of rough sets, presented by Polish mathematician Pawlak Z., is a powerful mathematical tool for analyzing uncertain, fuzzy knowledge. Based on the rough sets, this dissertation focuses on the core issues including attribute reduction and classification in data mining. It points out the shortcomings by studying the theory and method of attribute reduction algorithms in complete information system. And an improved algorithm for attribute reduction based on rough sets is proposed. By analyzing the traditional decision tree algorithm with instance, the problems from the traditional decision tree algorithm are pointed out and the improved of traditional decision tree algorithm, which is named decision tree constructing algorithm based on the weighted mean attribute significance(WMAS), is put forward. Main research results are as follows:1. A concept of the weighted mean attribute significance, which considers both the importance of attribute and its contribution to classification, is proposed based on the study of attribute significance in various attribute reduction algorithms.2. How to achieve efficient attribute reduction in rough sets has always been an important aspect of study. Current research has focused on how to get the sub-optimal solution of attribute reduction as it has been proved that searching the optimal solution of attribute reduction is an NP problem. First, this article will discuss the classical reduction algorithms, and then an improved algorithm for attribute reduction based on rough sets is presented, which consider not only the attribute significance but also the amount of information of attribute. It can get one reduction of information system, while the computing is decreased and speed is increased without solving the core.3. By studying the classic decision tree based on information entropy, we find out that it is confined to the problems that some sub trees appear repeatedly in the decision tree and some attributes are measured for many times on certain route of the decision tree. In order to overcome the defect, the attribute selection criterion, based on the Weighted Mean Attribute Significance, is proposed. And furthermore, we proposed decision tree constructing algorithm WMAS based on weighted mean attribute significance. It reduces the complexity and improves the classification accuracy.And it is verified with instance and experiments that the algorithm is advantageous. Significance; Attribute Reduction; Decision Tree

节点文献中: