节点文献

基于粗糙集的朴素贝叶斯分类算法研究

Research on Naivebayesian Classification Algorithm Based on Rough Set Theory

【作者】 胡银娥

【导师】 罗可; 周友常;

【作者基本信息】 长沙理工大学 , 计算机技术, 2012, 硕士

【摘要】 数据挖掘是信息技术自然演化的结果,是从大量数据中提取或“挖掘”隐藏的、具有潜在意义的知识的复杂过程。其中,对数据进行分类是数据挖掘领域研究的重要课题。贝叶斯分类法是一种具有坚实的数学理论基础以及综合数据先验信息能力的推理方法,其简单形式朴素贝叶斯分类模型由于具有简单而高效等优点得到了广泛的研究与应用。本文对朴素贝叶斯分类算法的分类原理以及优缺点进行了分析,从两个方面对朴素贝叶斯分类模型进行了深入地研究。首先着重研究通过属性选择来减少该模型的条件独立性假设的局限性,然后在此基础上结合集成学习技术来改进该模型。本文主要研究工作如下:1.通过分析王国胤等人提出的CEBARKNC属性约简算法存在的两点不足,提出了一种改进的基于条件熵的属性约简算法ASBCE。该算法引入关联规则中的余弦度量来识别不一致实例,并且根据某个属性是强相关则在一定程度上该属性与其他属性之间也存在较强的相关性的思想来删除冗余属性。实验证明,该算法能够得到一个最近似独立的属性子集,从而放松朴素贝叶斯的条件独立性假设。2.朴素贝叶斯分类模型基于贝叶斯理论以及条件独立性假设,具有结构简单且计算高效等优点。然而,现实中的数据一般难以满足条件独立性假设前提,此为朴素贝叶斯方法的局限性。为了突破这一局限性以提高分类器的分类效果,通过属性选择来选择一组最近似独立的属性子集是一种有效的改进方法。本文的研究重点是通过属性选择来找到一组最大相关最小冗余的属性子集,所以在ASBCE属性约简算法的基础上,提出了一种基于粗糙集的选择性朴素贝叶斯分类模型RSSNBC。实验结果表明,与经典朴素贝叶斯分类模型相比,RSSNBC模型取得了较好的分类正确率。3.为了进一步提高上述单一分类器的分类性能,引入分类器集成学习技术将多个分类器通过某种方法组合,最终得到一个组合分类器。朴素贝叶斯分类模型是一种简单高效的概率统计分类方法,简单精确的分类方法非常适合作为集成学习的基分类器。由于朴素贝叶斯分类模型是一种稳定模型,所以在采用装袋(Bagging)集成算法中嵌入特征选择来增强个体分类器之间的差异性,提高个体分类器的泛化能力。在ASBCE属性约简算法的基础上,提出了一种选择性朴素贝叶斯组合分类算法SNBCE。实验表明,通过集成学习结合特征选择,该算法能更有效地提高分类器的分类效果。

【Abstract】 Data mining is the result of the natural evolution of information technology, and is acomplex process of extracting or "mining" hidden and potential value knowledge from thelarge amounts of data. Among data mining technologies, the data classification is animportant research field. Bayes classification method is a kind of reasoning method that has asolid mathematics theoretical foundation and has a ability of integrating prior information anddata sample information. Especially its simple form Na ve Bayesian method has theadvantages of simple and effedtive and has been widely studied and applied.This paperanalyses the classification principle and advantages and disadvantages of the Na ve Bayesianclassification algorithm,and research the Na ve Bayesian classification model from twoaspects.Firstly,this paper emphatically research through attribute selection to relax conditionsindependence limitation of the model, and then based on this integrate ensemble learningtechnology to improve the mode. This paper mainly research works are as follows:1. This paper analyzes two defects existing in CEBARKNC attribute reduction algorithmproposed by Wang Guoyin and others, and proposes an improved attribute reductionalgorithm ASBCE based on condition entropy. This algorithm introduces the cosine metric ofassociation rules to identify samples that are not consistent, and according to the mind that ifone attribute is a strong correlation one, then there is a very strong correlation between it andothers in a property degree to delete the redundant attributes.Experiments show that thisalgorithm can get a kind of like independent attributes subset recently,and relax the conditionindependence assumption of Na ve Bayesian.2. Based on the Bayesian theory and the condition independence assumption, Na veBayesian classification model has advantages of simple structure and computeefficiency,etc.However,the reality data general has difficult to meet condition independenceassumption this is the limitation of the Na ve Bayesian model.In order to break this limitationto improve its classification effect,through the attribute selection to select an approximateindependent attributes subset is a kind of effective improvement method.The key research ofthis paper is to find a minimum redundancy and maximum related attributes subset through attribute selection.Based on the ASBCE attribute selection algorithm,this paper proposes aselective Na ve Bayesian classification model RSSNBC based on rouge set. The experimentalresults show that, compared with the classic Na ve Bayesian classification model, RSSNBCmodel has better classification accuracy.3. In order to further improve the performance of the above-mentioned singleclassification, introduce classifier ensemble technology to combine more than one classifiersthrough specific combination method, and finally into a combination classifier.NaiveBayesian classification model is a simple and efficient probability statistical classificationmethod, simple and accurate classification method is very suitable to serve as the baseclassifier of ensemble learning. According to the Na ve Bayesian classification model is astable model,so embed the attribute selection to enhance diversity between the classifiers inthe use of Bagging ensemble algorithm,and to improve the generalization of individualclassifier. Based on ASBCE attribute selection algorithm, this paper proposes a selectiveNa ve Bayesian combination classification model SNBCE. Through combining the ensemblelearning and attribute selection, experiments shows that, this algorithm can more effectivelyimprove its classification effect.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络