节点文献

偏最小二乘算法及其在基于结构风险最小化的机器学习中的应用

PLS Algorithm and Its Applications to SRM-Based Machine Learning

【作者】 白裔峰

【导师】 肖建;

【作者基本信息】 西南交通大学 , 电力电子及电力传动, 2007, 博士

【摘要】 机器学习是非线性科学研究的主要内容之一。大部分建立非线性系统模型的机器学习方法以极小化训练误差为优化目标,即基于经验风险最小化原则。近年来,基于统计学习理论,兼顾模型的经验风险和置信范围的基于结构风险最小化原则已成为机器学习研究热点之一。偏最小二乘算法作为一种源于过程控制的算法,借助提取数据中解释性最强的综合信息,实现对高维数据空间的降维处理,克服变量多重相关性。核偏最小二乘算法、支持向量机和模糊系统建模都是机器学习的有效学习方法,但在建立非线性模型的过程中各自仍存在一些不足。本文以核偏最小二乘算法、支持向量机和模糊系统建模等机器学习方法与偏最小二乘算法结合为思路,在机器学习过程中实现结构风险最小化原则为目标,展开论文的研究。根据Mercer定理,本文提出了一种简化核偏最小二乘算法,并同时提出一种满足结构风险最小化原则的风险的指标,仿真计算说明了该指标的有效性。为了解决核偏最小二乘算法中核函数矩阵维数随辨识样本膨胀的问题,本文提出的分块核偏最小二乘算法,通过划分核函数矩阵,减少了核偏最小二乘算法的计算负担。针对模糊系统模型的“规则数爆炸”问题,本文提出了基于子空间划分的模糊系统模型,并给出了基于遗传算法的自适应模型辨识方法。该方法按照一致、完备原则划分论域,部分地解决了模糊系统的“规则数爆炸”问题。在改进算法当中,使用偏最小二乘算法对数据进行预处理、建立初始模型,再利用基于子空间划分的模糊系统模型对残差进行建模。通过ε不敏感损失函数和子空间的划分达到模型的置信范围与经验风险的折中,实现了结构风险最小化。由于偏最小二乘算法泛化能力较差,本文将支持向量机算法和偏最小二乘算法结合,提出了基于结构风险最小化的加权偏最小二乘算法。使用支持向量机训练算法计算加权偏最小二乘算法中外模型的线性回归模型,实现了结构风险最小化原则。然后本文将支持向量机算法应用于T-S模糊系统模型的建模过程中,提出了基于支持向量机的T-S模糊系统模型的建模方法。该算法以支持向量为中心在论域空间模糊聚类,然后根据聚类结果形成模糊规则:模糊规则的前件为聚类中心,后件为对应该类的的线性偏最小二乘回归模型。不但可以自适应地建立T-S模糊系统模型,而且实现了结构风险最小化原则。为了能够在对时变系统建模或存在大数据量时建模过程中完成野点检测算法,本文提出了鲁棒递推偏最小二乘算法,解决了通常情况下计算量大的问题。通过将递推偏最小二乘法与鲁棒主分量回归算法相结合,不但有效解决了计算量大的问题,而且有效避免了存在多个野点时的掩盖和淹没现象。接触网检测对于高速电气化铁路安全运营意义重大。本文基于所提出的鲁棒算法研究了弓网关系的非线性模型。经过数据标准化和去除野点后,使用偏最小二乘法构造有效的输入—输出数据,最后使用支持向量机算法建立非线性模型。仿真结果说明模型精度能够满足实际要求。

【Abstract】 Machine learning is one of major fields of nonlinear science. Empirical risk minimization rule is a commonly used optimization index for large part of machine learning method for building nonlinear system model. According to statistical learning theory, how to give attention to both empirical risk and confidence interval, i.e. based on structure risk minimization rule, has attracted widely attention in the area of machine learning. Partial least-squares algorithm originated from process control achieves dimension reduction in high-dimensional data space and circumvents the collinearity problem due to highly correlated data by capture most explanatory variance in the original data. Kernel partial least-squares algorithm, support vector machine algorithm and fuzzy system modeling are all effective machine learning approaches, but they still have their own deficient in constructing nonlinear models yet.From the view point of integration of kernel partial least-squares, support vector machine, fuzzy system modeling and partial least-squares, the study in this paper is expanded to attain the goal of realization structure risk minimization rule in the process of machine learning.From the Mercer theorem, a kind of simple kernel partial least-squares algorithm is presented, and an index of risk is given at the same time, whose validity is proved in the simulation. To solve the problem of kernel function matrix expanding with the increase of the identification sample, block-wise kernel partial least-squares algorithm is described which divides the kernel function matrix and cut down the calculation burden.A subspace-partition based fuzzy system model (SPFS) and an adaptive model identification algorithm are proposed in this paper to solve the rule number’s explosion problem. The algorithm partitions the discourse universe on principle of consistency and completion, and is helpful in relieving the rule number’s explosion problem. In the advanced algorithm, partial least-squares algorithm is employed to pretreat identification sample and build initial model, then SPFS is built on the residuals. The balance of confidence interval and empirical risk is achieved byε-insensitive loss function and subspace-partition, and the structure risk minimization rule is realized.For the sake of poor generalization ability of partial least-squares, weighted partial least-squares algorithm are presented by integrated support vector machine and partial least-squares algorithm. Support vector machine training algorithm is used to calculate the linear regression model in outer model of weighted partial least-squared, and structure risk minimization rule is realized. Then T-S fuzzy system model based on support vector machine is proposed by applying support vector machine training algorithm in T-S fuzzy model modeling. The algorithm clusters with support vectors as centers in the discourse universe, then form the fuzzy rules: cluster center as antecedent proposition and linear partial least-squares regression model of the cluster as consequent proposition. Not only construct the T-S fuzzy system model adaptively, but also achieve the structure risk minimization rule.A robust recursive partial least-squares algorithm is proposed in this paper to solve the large computation burden problem of outlier detection algorithm in regression for time-varying system or massive data. By combination of recursive partial least-squares and robust principal components regression based on principal sensitivity vectors, settle the problem of large computation burden, as well as avoiding effectively masking and swamping with multi outlier exist. Overhead contact system detection is significant to manage high speed electrified railway safely. The nonlinear model of relation between pantograph and contact wire is studied based on the robust algorithm in this paper. After data standardization and outlier detection, input-output data is created by partial least-squares algorithm, and nonlinear model is constructed by support vector machine algorithm. Simulation result show the prediction precision of the model satisfied practical requirement.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络