节点文献

最小二乘支持向量机算法及应用研究

Researches on Algorithm and Application of LS-SVM

【作者】 熊杨

【导师】 肖怀铁;

【作者基本信息】 国防科学技术大学 , 信息与通信工程, 2010, 硕士

【摘要】 支持向量机(Support Vector Machines, SVM)是基于统计学习理论,借助于最优化方法来解决机器学习问题的新工具,它由Vapnik和Cortes于1995年提出,已成为近年来机器学习研究的一项重大成果。SVM在解决小样本、高维模式识别等问题中有着独特的优势,有效地克服了神经网络分类等传统分类方法的许多缺点,具有较高的泛化性能。最小二乘支持向量机(Least Squares Support Vector Machines, LS-SVM)是标准SVM的推广,具有标准SVM的诸多优点,同时,LS-SVM将模型的训练过程归结为一个线性方程组的求解问题,避免了标准SVM中的约束凸二次规划问题,从而简化了计算过程,大大降低了计算复杂度,提高了求解速度。SVM作为一门新兴技术,还需要在理论和应用的深度和广度上加以完善。本文主要在LS-SVM模型优化和特殊数据分类上进行了较深入的研究和实践,用以提高分类器的分类性能和泛化能力。本文的主要研究工作如下:1.在深入分析了SVM和LS-SVM的分类机理的基础上,研究了LS-SVM模型参数选择对模型分类的影响,指出只有选择合适的规范系数和核函数参数组合时,LS-SVM分类器才具有最优的分类性能和泛化能力。2.提出了基于多样性保持的分布估计算法(EDA-DP)的LS-SVM模型参数优化选择的方法,阐明了该方法的基本思路和主要步骤,并将算法应用于UCI基准数据集和雷达目标高分辨距离像(HRRP)的识别。实验表明,基于EDA-DP算法优化的LS-SVM模型具有良好的分类性能和泛化能力。3.在回顾不平衡数据欠抽样技术的基础上,研究了在核特征空间内对不平衡数据进行欠抽样的可行性,得出结论:基于SVM的不平衡数据分类,若使用径向基核函数,当输入空间中样本间距离较小时,在特征空间内对多数类利用有关最近邻的技术进行欠抽样,可以取得较优的效果;而若输入空间中样本间距离较大,则在特征空间中对多数类欠抽样,其最近邻技术可能会失去期望的效果。4.在深入分析了不平衡数据的分类特性的基础上,提出了基于聚类-邻域清理(SBC-NCL)的最小二乘支持向量机不平衡数据分类算法,并将算法应用于UCI基准数据集和雷达目标HRRP识别实验,证明了算法具有较高的泛化能力。

【Abstract】 Support Vector Machines(SVM) is a new developed machine learning method based on the Statistical Learning Theory(SLT) and optimization theory. It was proposed firstly by Vapnik and Cortes in 1995, it has become an important research direction in the field of machine learning. SVM has many advantages in pattern recognition, such as the superiority in small-sample and high-dimension problems, and it resolves the shortcomings of neural networks and other traditional classification methods effectively for its good performance and high generalization ability. Least Squares Support Vector Machines(LS-SVM) is developed from SVM. LS-SVM has all the advantages which SVM has, and it trains the model by linear equation group resolving, which reduces computing complexity and increases the solving speed, but not the quadratic programming problem as in SVM. However, as a new technique, researches on SVM still need be worked in theories and applications. The paper researches on SVM in model optimal selection and classification of special datasets in order to improve classification ability and generalization capacity of the classifier. The following parts are main works:1. Effects from LS-SVM model hyperparameter selection to classifiers were disscussed based on classification principle of SVM and LS-SVM. Then the facts were proposed that classifier with minimum structural risk is got just when penalty coefficient and kernel parameter are all appropriate.2. A method using estimation of distribution algorithms with diversity preservation (EDA-DP) to optimally select model parameters of LS-SVM was proposed, and the method was applied in recognition on UCI benchmark datasets and radar target high resolution range profile(HRRP) datasets. Experimental results showed that the classification model based on the algorithm had good ability.3. Under-sampling approaches for imbalanced data in feature space were discussed, and only when the distances between samples were smaller, some good effects of the classification on imbalance data would got by SVM which RBF was using.4. Cluster-based under-sampling and neighborhood cleaning approach(SBC-NCL) for imbalanced data classification by LS-SVM was proposed. The algorithm was applied in recognition on UCI benchmark datasets and radar target high resolution range profile(HRRP) datasets, and the results showed that the classification model based on the algorithm had good generalization capacity.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络