节点文献

基于粗糙集理论与支持向量机的数据挖掘方法算法研究

Data Mining Methods Research Based on Rough Sets Theory and Support Vector Machines

【作者】 王志龙

【导师】 王建州;

【作者基本信息】 兰州大学 , 应用数学, 2007, 硕士

【摘要】 论文首先就Rough Sets(RS)理论在数据挖掘中的应用所涉及到的一些关键技术问题进行了研究。众所周知,在大型知识库中,经常存在大量的冗余数据。冗余数据的存在,不仅浪费储存空间,而且干扰了人们做出正确而简洁的决策。论文分别从知识属性体系等价的角度、属性依赖程度及重要性的角度、可识辨矩阵的角度和信息论的角度研究了信息系统的知识约简问题。通过研究得到了这五个角度实施约简的方法程序,且发现了诸如信息系统中属性增多时信息熵单调不减的规律。事实上,经典的粗糙集理论在进行分类时其类之间的分界线很严格,这样提高了知识属性对被研究对象识别分类的精度,但这种方式的容错能力很差,使得模型的实际适用性很弱,为了改变这一缺陷,接下来探讨研究了变精度粗糙集的理论及约简问题。然后,分别探讨了支持向量机的模式分类法及回归分析法的建模原理、适用范围及求解问题。同时发现,SVM在数据挖掘中的优势也是其隐患之所在。若在小样本集合中存在噪音或矛盾信息,则对小样本预测的结果会产生很大地影响。在进行支持向量机预测分类之前,发现这些问题,并进行预先处理,正好是粗糙集理论的优势。于是,基于粗糙集理论和支持向量机方法各次的优点,探讨分析了如何将两者有机的结合起来,得出了将粗糙集理论和支持向量机多分类学习机结合的方法程序,给出了利用粗糙集和支持向量机构造多分类机的方法,举例阐述了各种类型的SVM多分类机构造的具体方法。

【Abstract】 This thesis studies some key technology questions in data mining based on rough set theory firstly. It is well known that there are usually much redundant data in large knowledge repository. These data waste the storage space and disturb making decision. In the thesis, the knowledge reduction is studied from equivalent relation of attribute system angle of view, from the dependent level of attribute and importance of attribute angle of view, from the discernibility matrix angle of view, from the viewpoint of information theory. By experimental studies, five ways reduced the attributes were obtained. Meanwhile we discover that the changing tendency of the information entropy is non-rigorous monotonically decreasing in comentropy, when the number of the attributes is increasing.In fact, the sorted boundary is very accurate when it is classified on the basis of classical rough sets theory. Although accuracy improves greatly in those ways for the classification recognition, its fault tolerance and the serviceability of model is very poor. For removing the defects, this thesis has studied variable precision rough sets theory and its reduction.Then it studied modeling principle of classification method of SVM and regression analysis of SVM, and studied their sphere of application and problem solving. At one time, it was discovered that its merits is hidden danger in data mining. If there are noise or contradictory information, results prediction based on small sample set will be greatly influenced. Before forecasting and classifying of SVM, these questions have been found and foreclosed, which is precisely what rough sets theory has advantage.Upon that, based on advantage and virtue of the rough sets theory and SVM methods, this paper studied how to combine two methods and obtained procedure methods which organically combine RST(Rough Sets Theory) with more than one classification learning machine of SVM. Also, it has given a method that through them construct more than one classification learning machine, and described it with examples.

  • 【网络出版投稿人】 兰州大学
  • 【网络出版年期】2008年 04期
  • 【分类号】O159
  • 【被引频次】7
  • 【下载频次】567
节点文献中: 

本文链接的文献网络图示:

本文的引文网络