节点文献

偏最小二乘回归法非线性建模及其递推算法的研究

The Study of Partial Least Square Nonlinear Model and Recursive Algorithm

【作者】 孙凤林

【导师】 郝志峰;

【作者基本信息】 华南理工大学 , 概率论与数理统计, 2010, 硕士

【摘要】 偏最小二乘回归算法通过在数据矩阵中提取正交成分建立回归模型,有效解决了变量间多重相关、样本量明显少于变量、缺失数据等难题,而这些问题会导致经典最小二乘回归算法失效,经过几十年的发展,偏最小二乘广泛应用于化学、化工、经济、环境、食品等领域。本文主要研究内容和研究成果包括两个部分:第一部分是研究近期偏最小二乘非线性建模算法。文中重点介绍了INLR(Implicit nonlinear latents variable regression)非线性建模算法,这种方法易实施且预测能力较强,尤其对存在多项式关系的系统有较好的拟合能力,但是INLR非线性建模算法是在数据矩阵中扩充非线性项(不包括原始变量的交叉项),这样系统中隐含成分的非线性项和交叉项,然后对扩充后的矩阵进行建模。可是,INLR非线性建模算法在扩充非线性项的同时可能也包含了与因变量无关的信息,这样算法会通过提取较多的成分来建模,而成分越多,模型的解释能力就越差。为了解决上述问题,本文通过正交投影算法-OPLS(Orthogonal projections to latent structure)对INLR非线性建模算法进行修正,改进后的算法称为OPLS-INLR,并通过模拟实验验证改进后的OPLS-INLR算法在保持预测能力不变的条件下,大大减少成分个数,提高模型的解释能力。第二部分是研究偏最小二乘递推算法,经典的递推算法研究的是因变量是连续变量的情况,本文重点研究因变量是分类变量的递推算法,提出基于KL-PLS(kernel logistic partial least square)的非线性递推算法,通过对判别红酒质量实例数据进行分析,分析结果表明KL-PLS非线性递推算法优于普通logistic递推算法、偏最小二乘logistic递推算法及KL-PLS算法。

【Abstract】 Partial least squares regression algorithm establishes regression models by extracting orthogonal components. PLS can efficiently deal with problem of many variables and collinearity between the independent variables, moderate amounts of missing data, low observation/variable ratio, which the ordinary least squares can not solve. After several decades of development, PLS is widely used in chemistry, chemical engineering, economics, environment, food and other fields.This paper mainly includes two parts contents and researches. The first part is about study of PLS nonlinear regression algorithm. In this part, INLR(Implicit Nonlinear variable regression) algorithm is introduced for its simple and having good predictive ability when system presence polynomial relations. INLR algorithm extend X with the quadratic or cubic terms and cross terms of components are implicitly included in the model of extended X, then we can implement the PLS algorithm to establish model, but the extended X may includes non-correlated variation. INLR algorithm will extract more components as a result of non-correlated variation to y. Interpretations of model will get bad, when the number of component is increasing. The paper adopt OPLS(orthogonal projections to latent structure) to solve that problem. OPLS method analyzes the variation explained in PLS component and removes non-correlated variation in X. The improved algorithm is called OPLS-INLR. The advantages of OPLS-INLR algorithm are proved by simulation experiment. The result illustrates OPLS-INLR retain the ability prediction of INLR and improve the interpretation of model. The two part of this paper is PLS recursive algorithm study. Classical PLS recursive algorithm study continuous dependent variables, this paper mainly research recursive algorithm of category dependent variable. The based on KL-PLS(kernel logistic partial least square) non-linear recursive algorithm is proposed. The predictive accuracy of KL-PLS nonlinear recursive algorithm is superior to logistic recursive algorithm, PLS logistic recursive algorithm and KL-PLS algorithm through analysis of red wine quality data.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络