节点文献

粗糙集与支持向量机结合的方法在连续属性离散化中的应用

Application of Rough Set and SVM in Discretization of Continuous Attribute

【作者】 洪成昱

【导师】 张雪峰;

【作者基本信息】 东北大学 , 运筹学与控制论, 2008, 硕士

【摘要】 粗糙集和支持向量机都是为了从数据中提取固定模式而提出的数据挖掘方法。粗糙集理论适用于海量数据,支持向量机是在统计学习理论基础上提出的分类方法,它的结构风险最小化准则和核函数理论,避免了“维数灾难”和“过学习”等传统方法的缺点。本文将粗糙集和支持向量机相结合,利用两种方法的优越性,提出了一种先用粗糙集进行预处理,再用支持向量机精确分类的方法。本文首先介绍了粗糙集和支持向量机的基本理论,对粗糙集的下近似、上近似、决策规则以及支持向量机的结构风险最小化原则、核函数等理论做了简要的回顾,分析了两种方法在数据挖掘领域的优势和局限性。然后,针对以往连续属性离散化方法分类规则复杂、会丢失大量信息的问题,提出了基于粗糙集下近似理论的离散化方法。这种方法可以对海量数据进行预处理,将根据粗糙集理论肯定属于某一类别的样本提出,并删除样本数据中可能的噪音数据,得到部分决策规则。这种方法不会破坏原数据集的不可分辨关系,而且得到的分类规则简洁。接着,利用支持向量机方法只与支持向量有关的特点和能够精确分类的优势,将经过粗糙集预处理的数据用支持向量机方法精确分类。最后,仿真实验表明,该方法在缩短训练时间的基础上,保留了支持向量机方法所需的分类信息,去除了样本数据中的噪音数据,提高了分类精度,克服了SVM算法的应用瓶颈。

【Abstract】 Rough set and the support vector machine (SVM) are the data mining methods which are aimed to make the fixed model. Rough set theory is suitable for the magnanimous data, simple and easy to use. SVM is a classification method proposed in the statistical learning theory, its structural risk minimization and the kernel function theory have avoided the traditional method shortcoming of "dimension disaster" and "over-fitting" and so on.By unifying rough set and SVM, this article proposed a method: first, used the rough set to carry on the pre-treatment, and then, used the support vector machine to make the precise classification.This article first introduced rough set and support vector machine elementary theory, made a brief review at lower approximation, upper approximation, decision rule of rough set, as well as structural risk minimization, kernel function theory of support vector machine, and analyzed two methods advantage and limitation in the data mining domain.Then, in view of the previous discrete method probably lose the massive information and the classification rule it obtained is complex and not easy to be understood, chapter five proposed a method that based on the lower approximation of rough set. This method can make the pre-treatment to the mass data and get the classification that definitely belong to some category according to the lower approximation of rough set, and delete some possible noise data. This method will obtain some decision rule finally. This method will not destroy the indiscernibility relation of original data, moreover the classified rule is brief.After that, using the support vector machine can make the precisely classify only with the support vector related, presents a SVM classification method based on rough sets lower approximations theory and its application in continuous attribute.Finally, experiment results show that the method can preserve the necessary information needed by SVM, and can improve the prediction accuracy and reduce the training time of support vector machine.

  • 【网络出版投稿人】 东北大学
  • 【网络出版年期】2012年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络