节点文献

复杂体系近红外光谱建模方法研究

Methods for Modeling Near-Infrared Spectra of Complex Systems

【作者】 杜国荣

【导师】 邵学广;

【作者基本信息】 南开大学 , 分析化学, 2012, 博士

【摘要】 近红外光谱分析以样品前处理简单、快速、无损、绿色等优势在分析科学领域备受关注。由于信息提取困难,近红外光谱需要借助化学计量学方法进行定性、定量分析。多元校正是常用的化学计量学方法,利用它可以建立近红外光谱和待测目标的定量模型,直接对复杂样品进行分析。本论文围绕复杂体系近红外光谱建模分析中奇异样本识别、变量选择及建模方法等问题展开了研究工作,并针对实际样品开展了应用研究。排序变量选择法是一种基于信息向量的方法。本论文将多模型共识思想引入排序变量选择法,提出一种共识排序变量选择法。该方法利用排序变量选择法建立多个模型,然后利用加权方法计算共识模型结果。由于加权方式对最终结果影响较大,方法中考察了三种加权方式,为结果的确定提供了指导。通过对三组近红外光谱数据的分析表明,共识排序变量选择法可以用于提高近红外光谱定量模型的预测能力。提出了一种双共识多模型建模方法用于提高多模型共识方法的预测能力。在建立子模型前,该方法用无信息变量消除和连续投影算法建立代表性强、共线性小的变量子集。建立子模型时,先用蒙特卡洛采样法选择样品子集,再用交叉验证选择变量子集建立子模型。通过迭代该过程,可以得到多个子模型,用这些子模型的平均结果作为共识结果。由于建立子模型过程同时分析了建模变量和建模样品,子模型的预测精度和差异都得到了提高。通过对两组烟草样品近红外光谱建模分析表明,双共识模型可以用于提高多模型共识方法的预测能力。变量选择方法一般计算量较大、耗时长。稀疏偏最小二乘回归方法在偏最小二乘回归建模过程嵌入变量选择步骤,在不显著增加计算量的基础上降低了无信息变量对模型的影响,提高了建模分析的预测精度。本论文提出了一种基于协方差的稀疏偏最小二乘回归方法,用于改进稀疏偏最小二乘回归方法的变量确定方式。通过三组近红外光谱数据和一组拉曼光谱数据的分析表明,基于协方差的稀疏偏最小二乘回归方法选择的变量少,预测误差小。偏最小二乘回归的预测结果由多个隐变量(latent variable)决定。奇异样本的预测值在隐变量上的分布规律与正常样本有较大差异,基于这种差异提出了一种基于模型检验的奇异样本识别方法。该方法首先计算每个样品在隐变量上的预测值,然后利用局部异常因子方法分析样品是否奇异。通过对三组近红外光谱数据的分析发现,基于模型检验的奇异样本识别方法可以用于提高复杂体系近红外光谱分析的预测能力。通过分析还发现,奇异样本对大样本数据的影响较小。利用间接模型的方法可以降低近红外光谱分析的检测限。基于间接模型原理,本论文研究了用近红外光谱法测定四种烟草产品特有的N-亚硝胺类化合物(TSNAs)的方法。为了优化模型降低误差,比较了预处理方法对预测结果的影响,发现背景扣除和光散射效应校正可以优化模型、降低误差;通过变量选择可以精简模型,显著提高模型的预测能力。通过对一组独立的预测集样品和一组不同年份样品的分析发现,偏最小二乘回归模型的预测结果和气相色谱-热能分析方法结果相一致。尽管部分低浓度样品的相对误差较大,但是该方法为烟草行业提供了一种简单、无损、快速测定TSNAs的方式。

【Abstract】 Near-infrared (NIR) spectroscopy has been widely used in analyticalscience due to its fastness, accuracy, green and simplicity. Because it isdifficult to extract quantitative informations from NIR spectra,chemometric methods, such as multivariate calibration, have beenextensively used. Quantitative models between spectra and target can beobtained through multivariate calibration, and the model can be used toanalyze unknowing samples. In order to build robust NIR quantitativemodels, works on variable selection, consensus methods, outlierdetection and etc. were studied in this dissertation.A consensus orderd predictor selection method is proposed fordealing with the NIR spectra of complex systems. In this method, theconsensus result is used as the prediction result and the influence of threeweighted methods on the consensus result was investigated. The superiority of the proposed method was demonstrated throuth three NIRspectra datasets.A strategy for improving the performance of consensus methods inmultivariate calibration of NIR spectra is proposed. In the approach, asubset of non-collinear variables is generated using successiveprojections algorithm (SPA) for each variable in the reduced spectra byuninformative variables elimination (UVE). Then sub-models are builtfor each variable subset using the calibration subset determined by MonteCarlo (MC) re-sampling, and the sub-model that produces minimal errorin cross validation is selected as a member model. With repetition of theMC re-sampling, a series of member models are built and a consensusmodel is achieved by averaging all the member models. Since membermodels are built with the best variable subset and the randomly selectedcalibration subset, both the quality and the diversity of the membermodels are insured for the consensus model. Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method. Thesuperiority of the method in both accuracy and reliability isdemonstrated.Sparse partial least squares method builds multivariate calibrationmodels with selected informative variables. A modified sparse partialleast squares method based on covariance is proposed for NIRspectroscopic analysis. In the method, uninformative variables areeliminated in modeling process based on the convariance between thespectra and target. Compared with the conventional sparse partial leastsquares method, the proposed method is more parsimonious and accuracy.With three NIR datasets and a Raman spectroscopic dataset, the methodis proved to be a potential way to dealing with uninformative variables incomplex spectroscopic analysis.An outlier detection method is proposed for near infrared spectralanalysis. The method is based on the definition of outlier and the principle of partial least squares (PLS) regression, i.e., an outlier in adataset behaves differently from the rest, and the prediction result of aPLS model is an accumulation of several independent latent variables.Therefore, the proposed method builds a PLS model with a calibrationdataset, and then the contribution of each latent variable is investigated.Outliers can be detected by comparing these contributions. Three datasets,including three NIR datasets of gasoline, beverage and tobacco lamina,are adopted for testing the method. It is found that the quality of themodels can be improved after removing the outliers detected by themethod.Indirect modeling of trace components in real samples by use ofnear-infrared spectroscopy has gained much interest, because it mayprovide a rapid way for analyzing the industrial or agricultural products.Coupling near-infrared diffusive reflectance spectroscopy andchemometric techniques, a method for rapid analysis of four kinds tobacco specific N-nitrosamines (TSNAs) and their total content isstudied in this work. For optimization of the models, techniques forspectral preprocessing and variable selection are adopted and compared.It is found that removing the varying background and the correction ofthe multiplicative scattering effect in the spectra are important in themodeling, and variable selection can significantly improve the models.For validation of the models, the TSNA contents of independent testsamples and tobacco leaves harvested in different year are predicted.Consistent results are obtained between the reference contents byGC/TEA analysis and the predicted ones. Although the relative errors forsome low content samples are not so satisfactory, the method is apractical alternative for industrial analysis due to the non-destructive andrapid nature of the method.

  • 【网络出版投稿人】 南开大学
  • 【网络出版年期】2014年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络