节点文献

基于粗糙集理论的数据挖掘研究

Research on Rough Sets Theory Based Data Mining

【作者】 王书青

【导师】 蒋文科;

【作者基本信息】 河北农业大学 , 农业机械化工程, 2004, 硕士

【摘要】 随着计算机、网络和通讯等信息技术的高速发展,信息的增长呈现超指数上升。信息量的急剧增长,使传统数据库的检索查询机制和统计分析方法已远远不能满足现实的需要,许多数据来不及分析就过时了;也有许多数据因其数据量极大而难以分析数据间的关系。如何从大规模的数据中挖掘深层次的知识和信息,而不仅仅是数据表面的信息,已经成为众多领域的研究热点。在这样的背景下,新的数据处理技术——知识发现便应运而生。 知识发现是从数据集中识别出有效的、新颖的、潜在有用的,以及最终可理解的模式的非平凡过程。数据挖掘是知识发现过程中的核心步骤,是目前相当活跃的研究领域。 粗糙集理论是波兰数学家Pawlak Z于1982年提出的一种分析模糊和不确定知识的强有力的数学工具。粗糙集理论作为人工智能领域的一个新的研究热点,它能够有效地处理不完整、不确定知识的表达和推理。这个特点使得粗糙集理论非常适合应用于数据挖掘。目前,基于粗糙集理论的数据挖掘方法已经成为主要的数据挖掘方法之一。研究基于粗糙集理论的数据挖掘具有极大的理论意义和现实意义。 介绍了粗糙集和数据挖掘的相关理论。在深入研究经典粗糙集理论的一些不足后,我们提出了一种粗糙集的拓广模型,即带隶属度及权重的粗糙集模型。在这种模型中,我们给出了带隶属度及权重的信息系统,进行了噪音的处理、近似空间的划分、决策属性对条件属性的依赖度的计算、属性的约简、关联规则挖掘步骤的建立等方面的研究,并用算例验证了该模型是可行的。这种粗糙集的拓广模型克服了经典粗糙集分类过于严格、对噪音过于敏感、某些隐藏在边界中的规则丢失等缺陷。它完全继承了粗糙集的性质,拥有粗糙集的所有优点。该模型提供了一种数理统计中常用的在一个给定错误率的条件下将尽可能多的对象进行分类的方法。该模型将在信息系统分析、人工智能及应用、决策支持系统、知识发现、模式识别、分类以及故障诊断等方面取得较好的应用。 今后的工作是开发基于这种粗糙集模型的实用软件系统和理论上的深入研究。

【Abstract】 With the rapid development of information technology such as computer, network, communication and so on, the increase of information takes on going up beyond the exponential speed. The mechanism of searches and query of traditional databases and the method of statistical analysis greatly cannot meet the realistic demand with the information sharp increasing. Lots of data is outdated before its analysis. And it is too difficult to analyze the relations among a great deal of data because the data is too much. It has become research hotspot in many fields that how not only ostensible but also embedded knowledge and information are mined from a great deal of data. In the background, the new technology of data processing, that is Knowledge Discovery in Database, is produced.Knowledge discovery in databases is the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in databases. Data Mining is the core step during the course of Knowledge Discovery in database. At present, it is a quite active research field.The theory of Rough Sets, presented in 1982 by Polish mathematician Pawlak Z, is a powerful mathematical tool for analyzing uncertain, fuzzy knowledge. Rough sets, as a new hotspot in the field of artificial intelligence, can effectively deal with the expression and deduction of incomplete, uncertain knowledge. The theory of Rough Sets is specially fit for the application to Data-Mining because of its features. Now the method of Data-Mining based on Rough Sets has become one of the main methods of Data-Mining. The study on Rough Sets based Data Mining has greatly theoretical and realistic meaning.The correlative theory of Rough Sets and Data Mining was delivered in this dissertation. We presented a kind of expanding model of Rough Sets, that is the model of Rough Sets with the grade of membership and weight, after lucubrating the deficiencies of the theory of traditional Rough Sets. In this model, we dissertated the information system with the grade of membership and weight, and researched into the process of noise, the partition of approximate space, the calculation of the dependent grade of decision-making attribute to conditional ones, the attributes reduction, the construction of excavating step of correlative rules etc. And the modelis feasible through the validation of an example. This expanding model of Rough Sets overcomes the deficiencies that its classification is too strict and it is excessively sensitive to the noise and some rules kept in boundary are lost etc. as far as traditional Rough Sets is concerned. This model completely succeeds the characters of Rough Sets and holds its all strongpoints. It provides a method that is commonly used in statistic and applied to more objects being classified on the condition of a given error ratio. It will obtain better application in some aspects such as analysis of information system, artificial intelligence and its application, decision support system, knowledge discovery in database, pattern recognition, classification and fault diagnosis etc.For the future, realistic soft system based on this model of Rough Sets will be theoretically lucubrated and exploited.

  • 【分类号】TP311.13
  • 【被引频次】6
  • 【下载频次】389
节点文献中: 

本文链接的文献网络图示:

本文的引文网络