节点文献

核方法在分类、回归与聚类方面的研究及应用

The Study of Kernel Methods in Classification, Regression and Clustering with Applications

【作者】 陈晓峰

【导师】 王士同;

【作者基本信息】 江南大学 , 轻工信息技术与工程, 2009, 博士

【摘要】 近年来,核方法在模式识别与机器学习领域中得到了快速的发展。核方法的本质,是通过核函数,把数据从低维的输入空间映射到高维的特征空间。如在分类问题上,核方法可以使输入空间中线性不可分的数据,在特征空间中是线性可分的。本论文对核方法中的鲁棒支持向量回归机、半监督多标记支持向量学习、稀疏支持向量学习及核聚类等四个方面进行研究。具体来说,本论文的工作分述如下:针对鲁棒支持向量回归机问题,提出一种自适应误差惩罚支持向量回归机AEPSVR,该算法能够减少离群点对支持向量回归机的不利影响。进一步地,研究了鲁棒支持向量回归机的代价函数的性质,引入一类鲁棒代价函数族,实现了模糊鲁棒支持向量回归机FRSVR。FRSVR不仅具有鲁棒性的优点,而且能够对离群点进行识别。对于半监督多标记的支持向量学习问题,研究一种半监督多标记支持向量算法SSML_SVM。SSML_SVM把半监督多标记学习问题转化为半监督单标记学习问题,然后基于MAP(Maximum a Posteriori)原则对未标记样本分类,通过迭代的方式求解半监督单标记学习问题。SSML_SVM能利用未标记样本的信息,提高多标记学习的性能。在稀疏支持向量学习问题上,给出一种直接稀疏核回归机DSKR。在DSKR中,通过给ε-SVR支持向量回归机增加非凸约束,限定支持向量个数,然后用梯度下降法求解优化问题。DSKR算法可以显著地降低支持向量的数量,用更少的支持向量,得到较好的拟合结果。在核聚类算法问题上,研究了两种改进的信任力传播聚类算法SSKAPC和AFAPC。SSKAPC用核函数将样本映射到高维空间,并使用先验信息辅助聚类,提高了聚类精度。AFAPC算法是一种基于万有引力的信任力传播聚类算法,该算法根据近邻样本之间的信息,加快聚类速度,能在更短的运行时间内,得到与信任力传播聚类算法相媲美的性能。作者在攻读博士学位期间还进行了伪图像识别方面的工作,研究一种伪图像识别算法BERFS。BERFS从语义的角度,根据相对频域特征和语义特征识别伪图像,它不但可以检测伪图像,而且能较好地估计出模糊区。

【Abstract】 Recent years, kernel method develops rapidly in pattern recognition and machine learning community. The nature of kernel method is to map data from low dimensional input space to high dimensional feature space, which can improve the performance of machine learning method. For example, for non-linearly separatable dataset in input space, the mapping may make it linearly separatable in feature space. In kernel method, there exist important problems to be solved. Among them, robust support vector regression, semi-supervised multi-label learning, sparse support vector learning and kernel clustering are in need of solutions.In this dissertation, these four problems are investigated. The contributions of this dissertation are as follows:Firstly, we propose an adaptive error penalization support vector regression method named AEPSVR. AEPSVR can reduce the affect of outliers, and achieves improved generalization capability. Furthermore, we investigate the properties of cost function for constructing robust support vector regression. Then a family of robust cost functions is introduced. Based on these cost functions, we implement a fuzzy robust support vector regression method called FRSVR, which is robust, and can be used to identify outliers.Secondly, for semi-supervised multi-label support vector learning problem, we present a semi-supervised multi-label learning method named SSML_SVM to obtain an effective multi-label method for gene expression data processing. The proposed SSML_SVM transforms semi-supervised multi-label learning into semi-supervised single-label learning by PT4 method, then it labels unlabeled examples using MAP (Maximum a Posteriori) principle together with K-nearest neighbor method, and finally, it solves single-label learning problem using SVM. The distinctive character of the proposed method is its efficient integration of SVM based single-label learning together with MAP and K-nearest neighbor method.Thirdly, we extend direct sparse kernel learning framework to support vector regression, and propose direct sparse kernel regression method called DSKR. By adding a non-convex constraint toε-SVR, DSKR can obtain sparse kernel regression with arbitrary user-defined number of support vectors. It can achieve promising regression performance with less number of support vectors thanε-SVR.In the last, we propose two improved kernel affinity propagation clustering methods called SSKAPC and AFAPC. Kernel trick is adopted for the purpose of processing non-linear problem. In SSKAPC, affinity propagation clustering method is extended to semi-supervised setting, in which background knowledge is provided in terms of pairwise constraints for improving clustering performance. In AFAPC, clusters and corresponding centers can be achieved by transforming affinity messages in data networks, where affinity messages are obtained based on gravity forces between data points. Experimental results demonstrate that the clustering accuracy of AFAPC is comparable with affinity propagation clustering. However, its running time is much less than that of affinity propagation clustering method.The author also does researching work on image forensics, and proposes a fake image detecting method named BERFS. BERFS can identify fake images using relative frequency feature and semantic feature with high accuracy, and can estimate blurred region precisely.

  • 【网络出版投稿人】 江南大学
  • 【网络出版年期】2010年 04期
  • 【分类号】TP181;TP391.4
  • 【被引频次】1
  • 【下载频次】825
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络