节点文献

特征选择方法及其在红斑鳞状皮肤病诊断中的应用研究

【作者】 王春霞

【导师】 谢娟英;

【作者基本信息】 陕西师范大学 , 计算机软件与理论, 2010, 硕士

【摘要】 特征选择作为一种数据预处理的重要方法,是监督学习算法中的一个重要组成部分,在数据挖掘、机器学习,模式识别等相关领域的研究和应用中有重要地位。近年来,图像处理、文本识别、基因表达等大规模问题的不断出现,特征选择算法越来越受到人们的重视,并对其提出了严峻的挑战,寻找能够适应大规模数据的准确性和运行效率等综合性能较好的特征选择方法成为一种迫切的需要。本文对高维数据的特征选择算法作了一些研究,提出了一种适用于多类别模式识别问题特征选择的特征重要性度量策略,并将所提出的特征选择算法应用到红斑鳞状皮肤病诊断中研究中。本文的主要工作包括以下几个部分。首先,对目前特征选择的研究现状和问题进行了具体而又深入的研究,分析了特征选择的定义,特征选择算法与特征提取的关系,特征选择的四个方面,特征选择的两种模式,归纳了几种常见的搜索算法,并提出了特征选择算法的选用技巧。其次,提出了一种改进的F-score特征选择方法。传统的F-score特征选择方法是度量样本特征在两类之间的辨别能力的方法,本文对其进行推广,提出了改进的F-score,使其不但能够评价样本特征在两类之间的辨别能力,而且能够度量样本特征在多类之间的辨别能力大小。另外,结合Filter和Wrapper各自的优缺点,提出了基于IFSFS (Improved F-score and Sequential Forward Search(顺序前进法))与SVM (Support Vector Machines,支持向量机)的特征选择方法。它以改进的F-score作为特征选择准则,顺序前进法(SFS)作为特征选择的搜索方法,用支持向量机作为分类方法来评估特征子集的有效性,实现有效的特征选择,并将该方法应用到红斑鳞状皮肤病的诊断中。通过实验结果证明该特征选择方法的有效性。最后,针对SFS的主要缺点,即一旦某个特征已入选,即使由于后加入的特征使它变为多余,也无法再把它剔除,本文提出了基于IFSFFS(Improved F-score and Sequential Forward Floating Search(顺序前进浮动搜索))与SVM相结合的特征选择方法。将IFSFFS+SVM特征选择方法应用到红斑鳞状皮肤病诊断中进行实验测试发现,该方法取得了非常好的诊断效果。

【Abstract】 In the field of data mining, machine learning and pattern recognition, feature selection, as an important way of data preprocessing, is an essential part of supervised learning algorithm. In recent years, the emerging of some large scale datasets, especially in image processing or gene expressing, feature selection has become a very popular area and faced more challenge. Now it is necessary to develop a feature selection algorithm with high accuracy and efficiency to implement the reduction for high dimensional dataset. This thesis focused on feature selection research on high dimensional dataset, and proposed new feature selection algorithms to diagnose erythemato-squamous diseases. The contributions of this dissertation mainly include the following parts.Firstly, this thesis made a specific and in-depth analysis on current focusing problems in feature selection area. Then we explained the definition of feature selection, and described the difference between feature selection and feature extraction, and introduced four aspects of feature selection methods and Filter and Wrapper feature selection methods. After that, we introduced some conventional feature selection search strategies, and put forward the skills of using them.Secondly, an improved F-score feature selection method was proposed in this thesis. The Origin F-score is a simple technique which measures the discrimination of two sets of real numbers. The improved F-score we proposed can measure the discrimination of more than two sets of real numbers.Thirdly, Based on the merits and demerits of filter and wrapper feature selection model, a coupling model for feature selection was proposed in this thesis. This model combed IFSFS (Improved F-score and Sequential Forward Search) and SVM (Support Vector Machines) to finish the process of feature selection. Where the improved F-score is used as an evaluate criterion of feature selection, SFS is regarded as search method in feature selection processing, and SVM is used to evaluate the features selected via the improved F-score. And then, the dermatology data of erythemato-squamous in UCI database was used to test our proposed feature selection model. The experiment results demonstrated that the model based on IFSFS and SVM is efficient in diagnosing the erythemato-squamous diseases and achieves high classification accuracy.Finally, due to the disadvantage of SFS, where once the feature is selected, it will not be deleted from the selected features, the thesis proposed another feature selection method, based IFSFFS (Improved F-score and Sequential Floating Forward Search) and SVM. The experiment results on diagnosing erythemato-squamous diseases demonstrate the feature selection method combing IFSFFS and SVM is more efficient and achieves higher classification accuracy.

  • 【分类号】R758.6;TP181
  • 【被引频次】1
  • 【下载频次】124
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络