节点文献

基于粗糙集理论的不确定性度量和属性约简方法研究

Study on Methods for Uncertainty Measure and Attribute Reduction Based on Rough Set Theory

【作者】 滕书华

【导师】 孙即祥;

【作者基本信息】 国防科学技术大学 , 电子科学与技术, 2010, 博士

【摘要】 随着数据获取手段的快速发展,数据库数量和规模的增长速度远远超出了人类分析和应用的能力。如何从杂乱无章的、强干扰的海量数据中挖掘出潜在的、新颖的、正确的、有利用价值的知识,来改变“数据丰富,知识贫乏”这种局面,已成为智能信息处理领域研究的一个重要课题。粗糙集理论作为一种新的知识发现方法,在很多领域获得了广泛的应用,其中属性约简是其最重要的应用之一。经过近30年的发展,基于粗糙集的属性约简理论和方法得到了迅速的发展和完善,但也存在着一些问题。如,不确定性度量在属性约简中有着重要的应用,而现有度量方法并不能精确描述集合的不确定性,故探讨更加合理的度量方法是一个基础性问题;此外,缺乏普遍适用的高效约简算法,这是制约粗糙集理论实用化的重要方面。据此,本文对粗糙集理论的不确定性度量和属性约简两方面进行了系统研究,主要工作及创新如下:(1)从知识区分能力角度在一般二元关系下提出了多种知识不确定性度量,通过直观的文氏图表示给出了新的不确定性度量明确的粗糙集理论含义,从而使得粗糙集理论中的不确定性度量的本质易于理解,丰富了粗糙集理论的内涵,并为后续的属性约简算法打下了理论基础。(2)考虑到数据对象具有不同重要性的情况,基于一般二元关系提出了新的知识加权不确定性度量—α熵、α条件熵和α互信息。通过调整参数α分析了现有多种不确定性度量的异同,进而将现有的多种不确定性度量统一在一般二元关系的粗糙集模型中。新的加权不确定性度量方便地融入了主观偏好、先验知识等因素,从而更加符合实际。(3)在一般二元关系下提出了一种适用性更广、更加有效的加权集成不确定性度量。理论分析和实例表明新的集成不确定性度量弥补了现有不确定性度量的缺陷,更符合人类的认知规律,更精确的反映了粗糙集的两种不确定性。(4)为了提高算法效率,把属性的区分能力作为启发函数,首先利用不可区分度在一般信息系统中提出了一种能够处理噪声的、高效的完备约简算法;其次,在决策信息系统中利用相对可区分度提出了一种高效的启发式约简算法,并给出了该算法与代数观点和信息观点下约简算法间的关系。通过对仿真数据和UCI数据集的实验结果表明,两种基于区分能力观点的约简算法不仅能有效的处理海量数据,而且在大多数情况下能够得到紧凑约简。(5)针对不协调决策系统,首先讨论了基于区分能力观点的约简定义和现有的不协调决策表多种约简定义之间的关系,给出了多种简化协调决策表的概念,进而提出了一种基于区分能力观点的高效不协调决策表约简算法。实验结果表明,新算法不仅可以求得现有多种方法的属性约简,而且具有较好的约简质量和较高约简效率,适合处理具有大量冗余属性的不协调数据集。(6)考虑到决策信息系统中的噪声,在一般二元关系下提出了两种能够抑制噪声的近似属性约简算法,即AAR-DV算法和AAR-WαA算法,两种算法适用于多种粗糙集扩展模型,摆脱了现有约简算法对特定二元关系的依赖。特别是AAR-WαA算法还将数据的先验知识引入到了约简算法中。实验表明两种近似约简算法可有效增强抗噪性,在有效降低约简属性集规模的同时,还提高了约简结果的分类性能。(7)考虑到进行分类时组合多个约简将产生互补信息,在一般二元关系下提出了一种基于加权α精度的多约简组合分类算法。通过对UCI数据的实验表明,多约简组合分类算法对于存在大量约简的数据集是可行的,在不增加算法时间复杂度基础上,不仅有效降低了特征数量,还大大提高了分类精度。综上所述,本文提出的不确定性度量和高效属性约简算法具有明确的粗糙集理论含义,简单易于理解,适用范围广,具有重要的理论意义和潜在的应用价值。

【Abstract】 With the rapid development of technology of acquiring data, humans are difficult to deal with this rapidly expanding amount of data. In order to solve the problem of data rich and information poor, how to acquire new, potential, correct and valuable knowledge in very large, mussy and noisy databases, has become one of the key research fields in intelligent information processing.As a new method of knowledge discovery, rough set theory (RST) has been widely used in many areas, and one of the essential applications is attribute reduction. With the development for almost 30 years, the theory and method of attribute reduction get a fast development and perfection. However, there are still some problems. Firstly, uncertainty measure of rough set is very important in attribute reduction, but the existing uncertainty measures can not well evaluate the attribute importance. Thus it is a fundamental issue to find more reasonable uncertainty measure. Secondly, there is no universal and efficient algorithm for attribute reduction, which limits the application of rough set. Based on these considerations, dissertation here performs a systematic study on uncertainty measure and attribute reduction in information system. The main work and innovation are as follows:(1) Some well-justified measures of uncertainty based on discernibility capability of attributes are put forward in general binary relation, and an explicit theoretical meaning of rough set is given to the new measures by intuitive venn diagram representation. These results are very helpful for understanding the essence of uncertainty measures in RST, enrich the intension of RST, and provide a theoretical basis for further algorithms of attribute reduction.(2) Some new kinds of weighted uncertainty measures calledαentropy,αconditional entropy,αmutual information, are presented by considering the subjective weights of data under general binary relation, and thus the differences between various uncertainty measures are analyzed by changing the parameterα. Further, the existing definitions of uncertainty measures become the special forms of the weighted uncertainty measures. Thus the proposed measures unify the existing uncertainty measures in general binary relation. Especially,the weighted uncertainty measures provide a profitable tool for combining of factors such as the decision maker’s preferences and prior knowledge, therefore they are accord with fact.(3) A well-justified weighted integration measure of uncertainty is proposed in general binary relation, which is more accurate and has a wide application. Theoretical analysis and examples demonstrate that the weighted integration uncertainty measure can completely reflect two factors of uncertainty, and it is consistent with human cognition. Therefore it overcomes the limitations of existing uncertainty measures. (4) In order to accelerate the process of attribute reduction, the discernibility ability of attributes is chosen as the heuristic function. Firstly, a new efficient heuristic reduction algorithm is proposed based on the indiscernibility degree in general information system, which is useful to deal with the noise. Secondly, a discernibility view-based attribute reduction algorithm is constructed in decision information system, and thus we make a comparative study on the quantitative relationships among the concepts of attribute reduction from the algebra viewpoint, information viewpoint, and discernibility viewpoint. At last, we test our algorithms versus other algorithms on simulation datasets and UCI datasets, the experimental results show the proposed reduction algorithms have the minimum number of selected attributes with the shortest time in most cases, and are feasible to deal with large data sets.(5) The relationship between attribute reduction from discernibility viewpoint and the existing reduction algorithms of inconsistent decision information systems is presented. Then, in order to simplify decision table, some simplified consistent decision tables are defined, based on which an efficient attribute reduction algorithm is designed in inconsistent decision information systems. Experimental results show the effectiveness and practicability of this algorithm on the large inconsistent data sets.(6) Considering the noise in decision information systems, we propose two kinds of approximate attribute reduction algorithms, such as AAR-DV algorithm and AAR-WαA algorithm, which can be used to deal with noise and be applicable to many extending model of rough sets. Especially, the prior knowledge of data is considered in AAR-WαA algorithm. Experimental results demonstrate that the proposed approximate attribute reduction algorithms can effectively improve sensitivity to noise, get more compact reduct, and simultaneously improve the classification performance.(7) Considering that combination of multiple reducts will produce more complementary information, we propose a new classification algorithm of combining more reducts based on weightedαaccuracy under general binary relation. The experimental results show that the proposed classification algorithm not only does not increase the time complexity, but also improve the classification accuracy with fewer features.In summary, the proposed uncertainty measures and reduction algorithms have very specific meaning of RST and better adaptability, and they are simple and easily understood. Therefore they have preferable theoretical and practical worth.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络