节点文献

稳健回归技术及其在光谱分析中的应用

Robust Regression and Its Application in Spectral Analysis

【作者】 包鑫

【导师】 戴连奎;

【作者基本信息】 浙江大学 , 控制科学与工程, 2010, 博士

【摘要】 现代工业生产过程中,为了严格控制产品质量,降低能耗与生产成本,减少对环境的污染,需要加强对产品质量的监测分析。产品质量分析方法主要包括化学分析法与仪器分析法,目前仪器分析法已成为分析方法的主流。光谱分析技术,因其分析速度快、对样品无损、操作技术要求低等优势,已成为一类常用的仪器分析方法,近年来得到了普遍的重视与广泛的应用。光谱定量分析大都采用如下方法:首先基于一组已知组成或属性的训练样本与对应的谱图建立光谱分析模型,再基于该模型与未知样品的谱图对未知样品的组成或性质进行分析计算。然而,实际应用中,受环境干扰、仪表偏差和人为失误等因素的影响,训练样本数据集中很可能存在部分异常样本;这些异常样本显著地降低了分析模型的可靠性与准确性。如何避免或减少异常或错误训练样本对分析结果的不利影响,已成为当前迫切需要解决的问题。本文以光谱定量分析为背景,对稳健回归技术进行了深入的研究,具体包括:1.针对现有稳健偏最小二乘(Partial Least Squares, PLS)的不足,提出了一种具有异常样本自动剔除功能的稳健PLS算法。该算法在建模过程中进行迭代计算,通过PLS回归误差分布确定置信区间,并由此自动剔除异常样本。同时,在现有局部回归的基础上,提出了稳健局部主成分回归(Principle Component Regression, PCR)算法。该算法对PCR所涉及的主成分分析和多元线性回归两个步骤都进行了稳健化处理,并在多元线性回归时采用了局部回归。上述稳健算法已应用于汽油辛烷值近红外光谱分析中,结果表明:这两种算法在稳健性和准确性上都优于其他线性稳健回归方法。2.为了提高现有最小二乘支持向量机(Least Squares Support Vector Machine, LS-SVM)的稳健性,提出了一种稳健的LS-SVM算法。该算法使用LS-SVM回归误差分布的稳健置信区间选择训练样本中尽可能多的正常样本用于LS-SVM建模,同时尽可能多地剔除异常样本。为了减少迭代计算时间,又提出了相应的快速算法。仿真与试验结果验证了算法的有效性。在此基础上,将该算法应用于汽油品质拉曼光谱分析仪中,运行结果表明,该方法能够有效检测出异常样本,模型预测精度符合实际应用的要求。3.针对原始加权LS-SVM (Weight LS-SVM, WLS-SVM)在收敛性和稳健性方面的不足,提出了一种WLS-SVM的稳健化迭代算法。该算法修正了原始WLS-SVM求取回归误差的计算公式,从根本上解决了WLS-SVM的收敛性问题;同时,对原始算法求权值的步骤进行了改进,采用回归误差的中值作为计算加权值的比较基准,从而大幅度提高了WLS-SVM的稳健性。4.为进一步提高WLS-SVM的稳健性,提出了一种结合M估计器的LS-SVM算法(MLS-SVM)。该算法用M估计器的残差代替LS-SVM目标函数中的最小二乘残差,并利用迭代方式求解修正后的优化问题。针对红外光谱分析的实验结果显示了该算法比WLS-SVM及其它常用的支持向量机算法更稳健,且计算时间与LS-SVM相差无几,可用于需要实时计算的场合。5.在上述研究的基础上,提出了广义LS-SVM算法(generalized LS-SVM, GLS-SVM)。该算法利用一般意义下的递减的残差偶函数代替了LS-SVM中的残差平方和,并采用迭代算法对GLS-SVM进行求解。在迭代计算过程中,并不需要计算残差偶函数,而只需要构造一个关于残差的加权函数;本文同时给出了几种典型的加权函数。针对烟草属性近红外光谱分析的研究结果表明,经过选择合适的加权函数,GLS-SVM具有良好的稳健性和预测精度。最后,在总结全文的基础上,对稳健回归技术及其应用进行了展望。

【Abstract】 In order to guarantee product quality and to reduce energy consumption, production costs as well as environmental pollution, quality monitoring need to be paid more attention in modern industries. Quality analysis mainly consists of chemical analysis and instrumental analysis. The instrumental analysis has become mainstream. Due to the advantages of spectral analysis, such as fast, non-destructive, easy to operate, etc., it has recently been used in many different fields. The principle of spectral analysis is demonstrated as follows:first, an analytical model is constructed based on a training data set with known compositions or properties and the spectra; then the compositions or properties of a testing sample can be calculated based on the model and its spectrum.However, in industrial applications, because of environmental factors, instrumental bias, human operation mistake and other reasons, there may be some outliers in the training samples; these outliers may greatly influence on the reliability and accuracy of the calibration model. How to avoid or reduce the influence of outliers has become an urgent problem. In this thesis, a series of robust regression algorithms have been proposed and applied in spectral quantitative analysis, which can be summarized as follows:1. In order to address the disadvantages of existing robust Partial Least Squares (PLS) algorithms, a novel robust PLS with outlier detection is proposed. New algorithm can automatically eliminate outliers based on the confidence interval, which is determined by PLS regression errors. Meanwhile, a robust local Principle Component Regression (PCR) algorithm is also proposed. In this algorithm, two steps of PCR, i.e., principle component analysis and multivariate linear regression, are both improved to enhance model robustness. Besides, a local regression method has been introduced to multivariate linear regression. The above two algorithms are applied in near infrared spectral analysis of gasoline octane number. Experimental results show that they perform better than other linear robust algorithms.2. To improve the robustness of Least Squares Support Vector Machine (LS-SVM), a novel robust LS-SVM algorithm is developed. In this algorithm, the confidence interval of the residual distribution is applied iteratively to detect outliers and select normal samples, and then the LS-SVM model is developed based on the selected subset which has no outliers. In order to reduce the computing time, the corresponding fast algorithm is also proposed. The novel robust LS-SVM is applied in a Raman analyzer to predict gasoline properties. Application results show it can detect outliers effectively; the predictive accuracy of the analyzer is suitable for industrial applications. 3. A robust iterative algorithm of Weighted Least Squares Support Vector Machine (WLS-SVM) is proposed to address shortcomings of the original WLS-SVM in convergence and robustness. In this algorithm, the calculation formula for regression residual in the original WLS-SVM algorithm is revised to ensure the iterative convergence; another formula to compute weighted value is also improved to enhance the robustness of WLS-SVM, in which the median value of regression error is selected as a criterion to compute weighted value.4. In order to further improve the robustness of WLS-SVM, a new robust LS-SVM algorithm based on M-estimator (MLS-SVM) is proposed. The residual in M-estimator firstly replaces the least squares residual in the LS-SVM objective function; then an iterative algorithm is used to solve the improved optimization problem. The MLS-SVM is applied to an infrared spectral analysis example. The result shows that MLS-SVM is more robust and accurate than WLS-SVM and other traditional SVMs; besides, the computation time of MLS-SVM is close to that of LS-SVM, which means MLS-SVM can be used in online analysis.5. A generalized LS-SVM (GLS-SVM) is proposed, in which a general decreasing residual even function is used to replace the sum of residuals squares in LS-SVM objective function. An iterative algorithm is proposed to solve the corresponding optimization problem, in which only a weighted function of the residuals is required to construct. Some typical weighted functions are also proposed. GLS-SVM is applied in the NIR spectral analysis of tobacco properties. Experimental results show its advantages on both robustness and accuracy when a proper weighted function is chosen.Finally, conclusions and future issues about robust regression are illustrated.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 08期
节点文献中: