节点文献

粗糙集在不完备信息系统数据挖掘中的应用研究

The Application Research of Rough Set in Data Mining of Incomplete Information System

【作者】 申爱华

【导师】 陈燕;

【作者基本信息】 大连海事大学 , 管理科学与工程, 2004, 硕士

【摘要】 1982年波兰学者Z.Pawlak提出了粗糙(Rough)集。它是一种处理不精确和不完备信息的数学工具,而且不依赖于数据集之外的任何附加信息。经历了近20年的发展,已经在理论和应用上取得了丰硕的成果。 数据挖掘是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、人们事先不知道的但又是潜在有用的信息和知识的过程。对一些含有不完备信息的数据,传统的数据挖掘技术无能为力,而粗糙集却可以对这一类信息进行处理。作为集合论的扩展,粗糙集理论的主要研究领域之一就是在信息不完备情况下的数据挖掘技术。 本文主要针对粗糙集理论在不完备信息系统中的应用展开研究。提出了基于分辨矩阵和数据分析的两种数据约简模型,并对分辨矩阵算法做了改进,最后对这两种模型进行了比较。 基于粗糙集理论,本文利用JAVA语言实现了一个简单实用的小型数据挖掘模型,该模型为B/S模式,面向互联网应用。以一个实际的电力系统网络故障诊断的实例进行检验,在电网故障中存在缺失和错误保护动作信号的情况下,该系统也能做出正确的诊断,说明粗糙集理论对于不完备和不精确的信息系统有很强的容错能力。

【Abstract】 In 1982, Polish scholar Z. Pawlak put forward Rough Set theory. It can be utilized as a mathematical tool for the analysis of imprecise and incomplete information with the support of the interrelated data set only. So far plentiful achievements of Rough Set have been made both in theory research and application.Data Mining is a process that can abstract embedded and potentially usefulinformation from large amounts of data that are incomplete, noisy, fuzzy, and random.Rough Set can deal with the data containing incomplete information, which is beyondthe ability of the traditional technology of data mining. As the extension of set theory,one of Rough Set’s central research fields is data mining with incomplete information.The dissertation mainly focuses on the research and application of Rough Set in the incomplete information system. It proposes two data reduction models based on discernibility matrix and data analysis separately, improves the discernibility matrix algorithm and made the comparison between the two models.In this dissertation, a data mining model based on Rough Set that is B/S structure is made. And a practical instance about electric power system fault diagnose is brought forward to check this model. It is found that part of the signals in the electric network fault is incomplete or even wrong, the right diagnose can still be obtained through the model. So it is concluded that this model is effective in handling the incomplete and imprecise information.

  • 【分类号】TP311.13
  • 【被引频次】3
  • 【下载频次】305
节点文献中: 

本文链接的文献网络图示:

本文的引文网络