节点文献

支持向量机若干问题及应用研究

On Issues and Applications for Support Vector Machine

【作者】 彭新俊

【导师】 王翼飞;

【作者基本信息】 上海大学 , 计算数学, 2008, 博士

【摘要】 支持向量机(SVM)作为结构风险最小化准则的具体实现工具,具有全局最优、结构简单、泛化性能强等优点。该技术已成为机器学习界的研究热点,并在很多领域得到了成功的应用。本文针对支持向量机,作了如下几个方面的研究:(1)指出基于贪婪思想的LS-SVM稀疏化算法得到的解容易落入局部极小点,即超平面并不稀疏。提出Invfitting法则分析迭代过程中所有支持向量,删除掉对决策函数影响最小的支持向量。并将Invfitting法则与逐次增加的支持向量的Backfitting法则有机结合,发展了更具有全局最优性的HBILS-SVM算法,从而敞少支持向量的数目,使得超平面更加稀疏。(2)分析了现有SVM几何算法中RCH的不足:RCH改变训练样本的凸包的几何形状,并且仅有极点表出的必要而非充分条件。引入了具有不改变几何体的形状、容易确定极点等优良特性的CCH的概念。据此讨论了基于CCH的SVM几何算法。同时,根据CCH极点的特性,提出了概率加速几何算法减少迭代中的计算量。(3)提出TM-ν-SVM解决了TM-SVM无法确定正则化参数的不足,确定了TM-ν-SVM的间隔误差和子支持向量的上下界,分析表明TM-ν-SVM算法可取得比ν-SVM算法更好的结果。同时具体分析了TM-ν-SVM的几何意义,即优化过径等价于求特征空间中两个SCCH间的最近点对。进一步,讨论了SCCH的几阿性质,据此给出了对应的几何算法。(4)讨论了将SVR转化为SVC的样本平移(SS)算法,并给出了基于经验法向量的样本平移(OSS)算法。进一步地,为减少噪声对经验法向量的影响,结合支持向量的几何算法,提出了基于特征空间中法向量的在线样本平移(OFGSS)算法。该方法可减少平移大小对回归函数的影响,降低噪声影响,具有较强的泛化性能。(5)分析对比了增量支持向量机和支持向量机的几何算法的优缺点。探讨了基于几何算法的支持向量机核参数确定方法。该方法结合了几何算法的优点,并利用对参数的近似梯度计算,从而以更快的速度得到最优核参数。为支持向量机的模型选择提供了一条有效的途径。(6)在回顾TSVM的各种学习算法之后提出了TSVM的一个改进算法——SMTSVM算法。SMTSVM算法通过引入序列最小化思想估计调整测试样本的临时标签后的Largrange系数,从而得到新的决策分类函数以及调整后的经验误差估计。该方法可解决过于简单地估计经验误差带来的分类精度上的不足。(7)讨论了将SVM应用到蛋白质相互作用预测工作中。通过利用蛋白质的结构域信息以及残基序列信息分别构建新的SVM输入向量,从而有效地预测了蛋白质相互作用及判定预测位点。数值模拟实验表明结合所讨论的特征表示方法得到的SVM预测器的性能远好于其它结果。

【Abstract】 As a tool of the structural risk minimization principle,support vector machine(SVM) brings along a bunch of advantages over other approaches,including uniqueness,globality,simple structure and good generalization properties.Because of its excellent learning power,SVM has become the topic of machine learning and successfully used over a lot of applications.In this dissertation,several aspects about SVM have been studied substantially as follows:(1) The idea in sparse algorithms for the least squares SVM(LS-SVM) is greedy,which causes the hyperplane to be not sparse enough.A novel Invfitting approach to analyze support vectors is shown,in which some support vector has smallest impact on the decision function will be deleted in iteration.Combine it and the Backfitting,a more global optimized HBILS-SVM algorithm is developed, which can effectively avoid the local solution,reduce the number of support vectors,and then derive a more sparse decision function.(2) The reduced convex hull(RCH) changes the shape of the convex hull of samples,and provides the necessary but not sufficient condition for the representation of its extreme points.The compressed convex hull(CCH) for avoiding these deficiencies is introduced.A CCH-based geometric SVM is shown after discussing the theory of CCH.Further,based on the characteristic of extreme points in CCH,a probabilistic accelerated CCH-based geometric SVM is presented to improve the computational speed.(3) A TM-ν-SVM is proposed to determine the regularization factor in TMSVM, in which the bounds of the fractions of the margin errors and sub-support vectors are both lower than those inν-SVM,i.e.,it derives better performance thanν-SVM.The geometric framework for TM-ν-SVM indicates that it is equivalent to finding the pair of closest points of the two soft compressed convex hulls (SCCH).Based on the discussion of the geometric propensities of SCCH,the corresponding geometric algorithm is developed. (4) An overview on the sample shift(SS) method in transforming support vector regression(SVR) to classification is given,a novel empirical gradient-based sample shift(GSS) approach is shown to solve the shortage in SS.Further,to reduce the impact of noise on empirical gradient,an online feature gradient-based sample shift(OFGSS) method by combining the geometric SVM is developed. Compared to the former,it reduces the impact of the shift value and noise,i.e., it has good generalization.(5) After comparing the merits and shortages of the incrcmcntal SVM with the gcometric SVM,a model selection way based on thc later is discussed.It combines the merits of the geometric SVM and the approximate gradient computation of the kernel parameters,which provides a rapid method to determine the kernel parameters in SVM.(6) After reviewing on a variety of algorithms for transductive support vector machine(TSVM),a sequential minimization algorithm for TSVM is introduced by introducing the sequential minimization idea to estimate Lagrange coefficients after adjusting the temporary label of a test sample,and then derives the new decision function and the estimation of empirical risk.This algorithm can solve the deficiency of the overly-simple estimation of empirical risk.(7) An application in predicting protein-protein interactions of SVM is discussed. In this application the information of the structure domain and sequcncc of proteins are respectively used to construct novel vector representation,which can effectively predict protein-protein interactions and interface sites.Numerical simulations show that the proposed vector representation derives bcttcr prediction performance than others.

  • 【网络出版投稿人】 上海大学
  • 【网络出版年期】2009年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络