节点文献

基于支持向量机的多光谱数据分类

Multispectral Data Classification Based on Supprot Vector Machines

【作者】 鲁淑霞

【导师】 王熙照;

【作者基本信息】 河北大学 , 光学工程, 2007, 博士

【摘要】 结合国家自然科学基金和河北省自然科学基金项目,研究了基于支持向量机的多光谱数据分类问题。目前遥感信息的提取和利用水平大大滞后于遥感技术的发展,因此研究新的理论和方法提高遥感信息的提取水平具有十分重要的意义。在多光谱数据分类中,由于训练样本非常有限、数据维数很高,容易导致严重的Hughes现象,传统模式识别的分类方法难以取得很好的结果。统计学习理论第一次系统地研究了在有限样本下的机器学习问题,提出了一种能够根据样本数量的多少合理地控制分类器的推广能力的一种模型选取原则—结构风险最小化原则。支持向量机是在该理论框架下产生的一种学习方法。本文以统计学习理论(Statistic Learning Theory-SLT)和支持向量机(Support Vector Machine-SVM)为基础,开展了以下几个方面的研究工作:首先,深入分析了多光谱数据的特点和传统模式分类方法在多光谱数据分类中面临的困难。把统计学习理论和支持向量机用于多光谱数据分类,有效地克服了Hughes现象,获得了比一般方法更好的分类精度。其次,总结了现有的几种有代表性的多类支持向量机方法,这些方法包括:一对多(one-against-all)、一对一(one-against-one)、有向无环图支持向量机(DAG-SVMs)、决策树分类和全局优化分类(MSVM);还介绍了两种模糊支持向量机方法。提出了两种改进的模糊多类支持向量机方法,它是在全局优化分类(MSVM)的基础上,引入模糊隶属函数,并将其用于多光谱数据分类,提高了数据的分类精度,具有较强的泛化能力。第三,针对传统支持向量机方法中存在对噪声或野点敏感的问题,提出了两种基于支持向量数据描述的模糊多类支持向量机方法。重点在隶属度的选取上不同,在确定样本的隶属度时,不仅考虑了样本与类中心之间的关系,还考虑了类中各个样本之间的关系。一种是基于数据紧描述引入模糊隶属函数;另一种是基于支持向量数据描述引入模糊隶属函数,使用近邻方法提取每个数据点的局部密度。数值实验结果表明,与几种支持向量机方法相比,上述两种基于支持向量数据描述的模糊多类支持向量机方法具有良好的抗噪性能及分类能力。第四,为了减少计算的复杂度,提出了基于聚类的支持向量机反问题求解方法。从实验结果看,基于聚类求解SVM反问题,有效地减少了算法复杂度,提高了计算效率,还研究了最大间隔与两个聚类中两个最近点的距离之间的数量关系。针对线性可分情况,研究表明线性硬间隔分类机的对偶问题与凸壳问题(平分最近点法)是等价的,线性硬间隔分类机的最大间隔与凸壳问题的两个最近点的距离相等:针对非线性可分情况,研究表明线性软间隔分类机的对偶问题与缩小的凸壳问题(推广的平分最近点法)是等价的,线性软间隔分类机的最大间隔与缩小的凸壳问题的两个最近点的距离相等。最后,总结了适合于求解大型问题的训练算法:选块算法(Chuncking),分解算法(Decomposing)和序列最小最优化算法(Sequential Minimal Optimization-SMO)等,这些都是专门针对支持向量机设计的快速算法;然后利用改进的序列最小最优化算法求解模糊多类支持向量机,实验结果显示运行时间减少了,方法是可行的和有效的。

【Abstract】 The theories and methods for high dimensional multispectral data classification are studied in the thesis based on support vector machines, which is an important part of research of the National Natural Science Foundation of China and the Natural Science Foundation of Hebei Province. The existing methods’ability of information extraction from spectral remote sensing images still largely lags behind technical developments. It is desirable and significant to study new theories and methods to improve this ability. Due to the limited number of training samples, high data dimension and the "Hughes Phenomenon", the performance of traditional pattern classification algorithms is often unsatisfactory. Statistical Learning Theory (SLT), the first theory that systematically studies the problem of machine learning with small size sample, presents a new inductive principle, structural risk minimization (SRM) principle, which can guide the selection of suitable classification model according to sample amount so as to obtain high generalization ability. Support vector machine (SVM) is a new general machine learning method based on SRM. In this thesis, several issues are addressed concerning the support vector machine and the classification of high dimensional multispectral data. The study is based on statistic learning theory (SLT) and support vector machine (SVM). The main work and results are outlined as follows:At first, the characteristics of high dimensional multispectral data are studied, and the weaknesses of the traditional pattern classification algorithms that deteriorate the performance are carefully analyzed. Appling statistic learning theory and support vector machine in high dimensional multispectral data classification, the Hughes phenomenon is reduced and higher classification accuracy is obtained.Secondly, five major types of multicategory support vector machine methods are systematically summarized and analyzed. These multicategory classification methods include: One-against-All, One-against-One, Directed Acyclic Graph SVMs (DAG-SVMs), Decision-Tree-Based Multiclass Support Vector Machines and Multiclass Support Vector Machines. Moreover, two types of Fuzzy Support Vector Machines are introduced and analyzed. Further, two improved fuzzy multicategory support vector machines are proposed and applied in classification of high dimensional multispectral data. They are based on the Multiclass Support Vector Machines method, and introduce the fuzzy membership of data samples of a given class so that to improve classification performance with high generalization capability.Thirdly, two types of Fuzzy Multicategory Support Vector Machines (FMSVM) based cn Support Vector Data Description (SVDD) are presented in order to reduce the effects of noises and outliers. The fuzzy membership is defined by not only the relation between a sample and its cluster center, but also the relation among samples. Two methods of defining the fuzzy membership are developed:One is based on the affinity among samples, and another is based on the improved Support Vector Data Description. The experimental results show that the presented two fuzzy multicategory support vector machines methods are more robust than the traditional support vector machine.Fourthly, in order to reduce the computational complexity, we propose a method for solving SVM inverse problems based on clustering. The computational complexity of SVM inverse problems by clustering is greatly reduced and the margin is enlarged. Based on the clustering, the relationship of the margin and the closest points in convex hulls can be also analyzed. For the linearly separable case, it is demonstrated that the maximum margin between the two subsets is equivalent to the distance of the two closest points in the convex hulls. For the inseparable case, the maximum margin between the two subsets is equivalent to the distance of the two closest points in the reduced convex hulls.Finally, the training algorithms of SVM for large-scale training set are summarized and analyzed. These methods include Chuncking, Decomposing and Sequential Minimal Optimization (SMO). We have used the improved Sequential Minimal Optimization to solve fuzzy multicategory support vector machine. The experimental results show that the computational load is greatly reduced and the generalization capability is improved.

  • 【网络出版投稿人】 河北大学
  • 【网络出版年期】2011年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络