节点文献
支撑向量回归算法及其应用研究
Research on Support Vector Regression Algorithms and Its Application
【作者】 郑逢德;
【导师】 张鸿宾;
【作者基本信息】 北京工业大学 , 计算机应用技术, 2012, 博士
【摘要】 在过去的数十年,模式识别和机器学习领域的诸多算法深受过学习、局部极小点、训练样本过于巨大等问题的困扰,基于统计学习理论的支撑向量机(SVM)在一定程度上克服了这些问题,通过引入ε不敏感损失函数将SVM成功应用到回归中的,现有的SVM研究大多针对分类问题,而很多分类算法并不能直接用于回归问题,本文深入研究了支撑向量回归算法(SVR),主要研究内容为:1.基于双支撑向量回归(TwinSVR,TSVR)的改进算法对回归算法的改进主要是提高效率或者提高性能两个方面,TSVR通过求解两个支撑向量类型的问题求两个非平行平面,将一个大的带有两组约束的优化问题转化为两个小的分别带一组约束的优化问题,因此缩短了训练时间。本文基于TSVR提出了三种改进算法:①针对TSVR原始问题中不包含正则项,通过添加正则项实现了结构风险最小化原则,正则项参数的调节使得回归函数更能适应数据,得到的对偶问题可以采用SOR算法求解,比TSVR快速很多,不求对偶问题直接在原始空间采用梯度算法也可以快速求解。②采用临近支撑向量机的做法,将不等式约束改为等式约束,同时将TSVR中每一个小的二次规划问题中的对松弛变量原有的一次惩罚项改为一次项和二次项相结合,这样更有利于拟合数据,得到的优化问题可以采用惩罚函数法求解。③将TSVR优化问题中松弛变量原有的一次惩罚项改为二次惩罚项,对偶问题变为只有非负约束的优化问题,可以采用迭代算法求解,该迭代算法能从任何初始点快速收敛,避免了二次优化问题求解因此能显著提高训练速度。在人工数据集和标准数据集上的数值实验显示了提出算法的有效性。2.增量支撑向量回归算法研究当训练样本分批达到或者过于巨大时,传统的学习算法不能适用,因此需要增量学习算法。基于Lagrangian支撑向量回归(LSVR)提出两种增量回归算法,在线增量学习算法一次增加一个新样本,批增量学习算法一次增加多个新样本。LSVR得到的无约束最优化问题可以采用快速迭代算法求解,求解时只需要对矩阵求逆。本文提出的两种增量算法在增量训练时矩阵求逆可以利用上次求得的逆矩阵,充分利用了历史学习结果,因此减少了很多重复计算,使得增量后矩阵逆的计算大大简化,降低了算法运行时间。在多个数据集上进行了对比,实验结果表明算法同以前算法相比不仅提高了算法运算速度,而且保持了较好的拟合精度。3.Lagrangian支撑向量回归的牛顿算法LSVR是一种快速的回归算法,避免了求解二次优化问题,但是该算法需要较多的迭代次数才能终止。采用Armijo步长有限牛顿迭代算法(NLSVR)求解LSVR的优化问题,只需有限次求解一组线性等式,该算法具有全局收敛和有限步终止的性质,在多个合成数据集和标准数据集上的实验结果表明了提出算法具有有效性和快速性。4.鲁棒的原始空间加权支撑向量回归算法及在股票价格预测中的应用数据中包含离群点(outlier)会严重影响回归性能,因此需要剔除这些离群点,本文提出了一种鲁棒回归算法,以加权的方式通过软剔除技巧剔除偏离模型的离群点,数据偏离模型越远,它的损失函数的权重越小,对模型参数估计的影响也越小。支撑向量回归问题一般都在对偶空间进行求解,而在原始空间里也能高效地求解,对本文得到的加权支撑向量回归算法在原始空间采用递归有限步牛顿法求解。数值实验以及在股票价格预测中的实验证实了本算法的有效性。
【Abstract】 In the past few decades, many of pattern recognition and machine learningalgorithms bothered with overfitting, local minima, training sample is too huge. Basedon statistical learning theory, support vector machine (SVM) partly overcome theseproblems. SVM successfully applied to regression by the introduction ofε-insensitive loss function. Many research of SVM is based on classificationproblems and can’t be directly used to solve the regression problems. This paperfocuses on the research of SVR and the main contents are as follows:1. Improved twin support vector regression algorithms.Twin SVR (TSVR) converts the classical quadratic programming problems (QPPs)with inequality constraints to two small QPPs with equality constraints. This paperproposed three modified algorithms based TSVR.①We add a regularization item inthe QPPs of TSVR and implement structural risk minimization principle. Theregression function can more fit the data by tuning the regularization parameter. TheSOR algorithm is used to solve the dual problem of the regularized TSVR. Thegradient algorithm can be used to solve the QPPs of regularized TSVR directly.②Wemodify the equality constraints of QPPs of TSVR to inequality constraints byintroducing a technique as used in proximal support vector machine and add the unitynorm constraint. The result dual QPPs are solved using penalty function approach.③Change the penalty item of slack variable to quadratic penalty item and thus the smallsize QPPs of TSVR can solve by an iterative algorithm. The iterative algorithmconverges from any starting point and does not need any quadratic optimizationpackages. Thus this algorithm is very fast. The experiments on artificial andbenchmark datasets show that the proposed method is competitive with previouslypublished methods.2. Incremental support vector regressionThe incremental learning algorithms are needed when the train sample is too hugeor arrive gradually. This paper proposed two incremental regression algorithms basedon Lagrangian support vector regression (LSVR). The incremental learningalgorithms for LSVR presented in this paper include two cases that are namely onlineand batch incremental learning. LSVR leads to the minimization of an unconstraineddifferentiable convex programming and is solved by an iterative algorithm with a simple linear convergence. The iterative algorithm converges from any starting pointand does not need any quadratic optimization packages. LSVR has the advantage thatits solution is obtained by taking the inverse of a matrix of order equals to the numberof input samples at the beginning of the iteration. The proposed algorithms are solvedbased on the previous computed information, it is unnecessary to repeat thecomputing process. The effectiveness of the proposed method is illustrated withseveral UCI data sets. These experiments show that the proposed method iscompetitive with previously published methods.3. Finite newton Method for Lagrangian support vector regressionLagrangian support vector regression is an effective algorithm, but need manytimes to convergence from a starting point. We use a finite Armijo-Newton algorithmsolving the Lagrangian SVR’s optimization problem. Solution is obtained by solving asystem of linear equations at a finite number of times rather than solving a quadraticoptimization problem. The proposed method has the advantage that the resultingoptimization problem is solved with global and finite termination properties.Experimental results on several artificial synthetic datasets and benchmark datasetsindicate that the proposed NLSVR is fast and shows good generalization performance.4. Primal weighted support vector regression.Propose a robust weighted regression algorithm solved in the primal space using aniterate newton algorithm. This algorithm eliminates outliers through weightedapproach. The further the sample deviates from the model, the smaller the weight ofthe loss function, and the affection to the estimation of the model’s parameters issmaller, too. We solved the weighted support vector regression in the primal spaceusing an iterative newton algorithm. These experiments on artificial, benchmarkdatasets and prediction of the stock prices show the proposed method is competitivewith previously published methods.