节点文献

改进的高维非线性PLS回归方法及应用研究

Study on Improved High-Dimension and Nonlinear Partial Least-Squares Regression Method and Applications

【作者】 郭建校

【导师】 王洪礼;

【作者基本信息】 天津大学 , 管理科学与工程, 2010, 博士

【摘要】 偏最小二乘(PLS)回归是一种基于高维投影思想的新的非参数回归方法,可有效地将多元回归、主成分分析以及典型相关分析等功能有机地结合起来,因此,它已被誉为第二代多元统计分析方法。识别特异点和对变量集实施降维是回归建模前的两个重要的数据分析预处理过程。本文基于PLS回归模型,结合非线性核主成分分析、二叉树等多种方法,提出了改进的非线性偏最小二乘回归模型、二叉树降维方法和降维二叉树评价方法,并扩展了特异点识别方法。主要研究内容如下:提出了一种改进的非线性偏最小二乘回归模型。传统的线性及非线性PLS回归模型计算因变量集与提取的主成分之间的线性回归,而没有考虑因变量集和主成分之间可能是非线性关系。本文把因变量集对各个主成分的线性回归改进为可根据具体情况选择线性回归或非线性回归,每个主成分依旧表示成原始自变量集的线性回归方程。本文还具体分析并建立了汽车油耗及其他十个设计及性能方面的指标之间的非线性回归模型。提出了高维空间的二叉树降维方法及降维二叉树评价方法。本文提出了将传统的整体降维,改进为从局部降维再延伸到全局降维的一种逐步降维的新方法。如果样本变量数n过大,可对相关性最强的两个变量实施主成分分析或核主成分分析:提取第一个成分变量代替原来的两个变量,样本变量数则降维为n ?1,循环执行此降维过程,直到满足精度为止。整个降维过程表现为一棵二叉树或残缺二叉树。根据降维二叉树评价方法,采用天津市2008年各区县经济发展指标,具体对天津市18个区县的经济发展水平进行了科学的评价。分析并扩展了高维空间的特异点识别方法。在基于PLS回归识别特异点的分析技术基础上,将识别特异点的二维平面T 2椭圆图方法扩展到三维空间T 2椭球和高维空间T 2超椭球,同时基于谱系聚类法,提出了基于高维空间主成分谱系图的特异点识别方法,并对我国主要省份、城市的汽柴油价格进行了分析。

【Abstract】 Partial Least-Squares (PLS) Regression is a new non-parametric regression method based on higher-dimensional projection. It can effectively combine functions of multiple regression analysis, principal component analysis and canonical correlation analysis. That’s why it has already been labeled as the second generation of multiple statistical analysis method. Identification method of Specific Sample Points and bitree dimension reduction of a variable set are two important preprocessing of data analysis. Based on PLS regression model and combined with non-linear Kernel Principal Component Analysis and Binary Tree Dimension Reduction methods etc., the dissertation came up with a modified non-linear Partial Least-Squares Regression model. Moreover, dimension reduction method and evaluation methodology for Binary Tree were also presented. Furthermore, Specific Sample Points’identification method was also extended. Main research contents are as follows:An improved non-linear Partial Least-Squares Regression model was proposed. Traditional linear and non-linear PLS regression models calculate linear regression relations between dependent variable set and principal components extracted, without taking into consideration that dependent variable set and principal components may have non-linear relations. In the dissertation, linear regressions of dependent variable set to each principal component was modified to linear or non-linear regression choosing according to concrete conditions. And each principal component was still expressed as linear regression equation of the original independent variable set. The dissertation also elaborated on and further established a non-linear regression model of motor oil consumption and ten other indicators about design and performance.Binary Tree Dimension Reduction methods in higher dimensional space and evaluation methodology for dimension reduced Binary Tree were also proposed. In the dissertation, the traditional method to reduce dimensions on the whole was modified to reduce dimensions from partial sections to overall. If the sample had an oversized variable number, then Principal Component Analysis or Kernel Principal Component Analysis could be implemented between two variables having the strongest correlation: extracting the first component variable to replace the original two variables, the sample variable number would then be reduced to n ?1. This dimension reduction process would be executed circularly until the precision demanded obtained. Depending on evaluation methodology for dimension reduced Binary Tree, the dissertation adopted economic development indicators of each district or county in Tianjin in the year 2008, and made a scientific evaluation on economic development levels of 18 districts or counties in Tianjin. What’s more, identification method of Specific Sample Points in higher dimensional space was also extended. Based on analysis technics of PLS regression Specific Sample Points identification, ellipse T~2 recognition method in two dimensional surface was extended to Ellipsoid T~2 in three dimensional surface andHyper ellipsoidal T~2 in higher dimensional surface. In the meanwhile, on the basis of pedigree clustering method,Specific Sample Points identification method based on principal component pedigree chart in higher dimensional surface was brought up and employed to evaluate gasoline and diesel prices in major provinces and cities in China.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2011年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络