节点文献

小麦内在品质近红外光谱无损检测技术研究

Study on Nondestructive Detection Technique of Wheat Internal Quality by Near-infrared Spectroscopy

【作者】 宦克为

【导师】 石晓光;

【作者基本信息】 长春理工大学 , 物理电子学, 2014, 博士

【摘要】 中国是一个人口大国,小麦是我国最重要的粮食之一,如何快速的、有效的、无损的检测小麦中的各种化学成分,并对多项指标进行评价,一直是各国研究的问题。国内许多行业对高品质小麦的需求促进了小麦无损检测技术的发展,但是现有的近红外光谱分析仪器多数体积庞大,价格昂贵,不适合做现场分析及在线检测,从而使近红外光谱技术推广应用存在明显障碍。另外,虽然目前水分、粉碎谷物蛋白、整粒谷物蛋白等的近红外光谱方法已被国际标准化委员会所认可,但是,在近红外光谱数据处理过程中,由于样本的复杂性使得物理层交叠信息的解释性较弱,校正模型的传递性和普适性差,因此,近红外光谱技术和仪器至今尚未在全球范围内广泛应用,建模过程中的思想和方法仍是努力探索的前沿课题,而小麦的近红外光谱数据具有组份复杂,变异度高,自然采样不受控等特点,使其近红外光谱分析问题成为一个亟待解决的问题,本文是在这一背景下开展小麦内在品质无损检测技术研究,主要研究内容如卜:1、首先,简单介绍了现有的近红外光谱分析仪器,详细论述了常用的近红外光谱预处理方法及建模方法,其中,光谱预处理方法包括平滑、求导、小波变换(WT)多元散射校正(MSC)、标准正态变换(SNV)、正交投影方法(OSC)等,建模方法主要包括偏最小二乘方法(PLS)、支持向量机方法(SVM)等,并给出了模型评价理论。其次,基于目前小麦近红外光谱测试的主要方法,自行设计了近红外漫反射光谱测试系统和近红外漫透反射光谱测试系统,其主要包括光纤耦合系统、光源系统等。对于光纤耦合系统,基于光学扩展量和分布加权平均采样的思想,提出利用分层环带光纤耦合的形式收集样品的漫射光,设计了一个分立式双层环带分布收集结构,该结构主要将整个光接收面分为两层环带,每层环带的接收点数量不同,其中,一层为9个接收点,另一层为10个接收点,共计19个接收点,9个接收点的环带层与受照表面成30°角,而10个接收点的环带层与受照表面成60°角,并且每个接收点的光能量收集方式均采用光纤耦合收集(可配耦合透镜增加接收立体角),解决了传统积分球采样结构存在的开孔小、样品装样姿态对测试影响大等问题;对于光源系统,采用长寿命卤钨杯灯作为光源的灯泡,聚光结构采用反射聚光器配合前置准直透镜的聚光结构,再辅以滤光片使光线集中,定向性好,辐射效果理想。2、针对于小麦水分模型,基于信息粒化思想,在有监督学习方式下,利用小波多尺度分解,实现小麦近红外光谱的特征提取,选择具有代表性的小波系数,重构光谱,建立预测模型,使小麦近红外光谱水分预测模型的校正均方根误差(RMSECV)扫全光谱的0.4887降低到0.2910,降低了40.5%,极大程度优化了模型,提高了模型预测精度。3、基于常用的变量选择方法,主要包括无信息变量消除算法(UVE)、连续投影算法(SPA)、无信息变量消除算法结合连续投影算法(UVE-SPA),针对于小麦蛋白质模型,采用连续小波变换(CWT)和多元散射校正(MSC)对原始光谱进行预处理,分析了不同的变量选择方法的变量选择结果,并提出了基于特征投影图(LPG)的变量选择方法,并给出了具体处理步骤。其次,分别对SVM模型、CWT-SVM模型、CWT-MSC-SVM模型、CWT-MSC-UVE-SVM模型、CWT-MSC-SPA-SVM模型、CWT-MSC-UVE-SPA-SVM模型、CWT-LPG-SVM模型和CWT-MSC-LPG-SVM模型的建模结果进行了讨论,并给出了相应模型评价,其中,CWT-MSC-LPG-SVM模型效果最好,使变量数目减少了90%,预测均方根误差(RMSEP)降低了34%,极大程度提高了小麦蛋白质预测模型的预测精度。4、给出了模型集群分析(MPA)思想,MPA思想是首先根据所收集样本,利用蒙特卡洛采样技术(MCS),进行子训练集的划分,本文将收集到的93个小麦样本,按照2:1的原则,采用蒙特卡洛采样技术分别建立500个子建模集和子预测集;其次,针对于每个子训练集建立子回归模型,本文利用特征投影图方法结合偏最小二乘方法针对于每个子建模集和子预测集进行建模分析,并得出500个子预测集的均方根误差;最后,根据所建子模型,分别在样本空间、变量空间、参数空间、模型空间进行讨论,并对子模型的参数进行统计分析,从而选择感兴趣的信息,本文是将500个子预测集的均方根误差(RMSE)进行统计分析,删除预测均方根误差大的模型42个,在剩余的458个子模型中,将每个子模型中所选择的变量进行统计分析,统计出现频次高的特征变量,共计12个。比较分析不同变量选择方法的建模结果,其中,基于MPA思想的CWT-MSC-MC-LPG-PLS模型将变量数减少了95%,模型精度提高了51%,可以更好的应用于小麦蛋白质近红外光谱建模。

【Abstract】 China is a big country with large population and wheat is one of the most important foods. It is a popular topic among countries as how to detect the chemical components of wheat and evaluate a number of indicators fast, efficiently and nondestructive. The development of nondestructive testing of wheat has been promoted by the demand of high quality wheat in many domestic industries. However, many existing near infrared (NIR) instrument are bulky and expensive which are not suitable for on-site analysis and on-line testing. Obviously, it is a great barrier for the promotion and application of NIR spectroscopy. Although the NIR measurement method of water, crushed grain protein, and whole grain protein have been recognized by the International Committee for Standardization, in the NIR data processing, the weak information of overlapping peak of NIR spectrum and the poor model transfer and universal ability of the calibration model are induced by the complexity of the sample. Therefore, NIR spectroscopy techniques and instruments are not widely used in the global scale. The ideas and methods in the modeling process are hot topics which require great efforts to explore. While the wheat NIR spectral data have the characteristics with complex components, high variations and uncontrolled natural sampling, therefore the NIR spectroscopy problem is eager to be solved. This paper covers the research of wheat nondestructive technology in such background. The main contents are as follows:1. Firstly, a brief introduction to the existing NIR instrument is introduced and the commonly used NIR spectral preprocessing methods and modeling methods are discussed in detail. Those spectroscopy pretreatment methods contain smooth method, derivation method, wavelet transform (WT) method, multiplicative scatter correction (MSC) method, standard normal variate (SNV) method, and orthogonal signal correction (OSC) method. Modeling methods include partial least squares (PLS) method, and support vector machine (SVM) method. The principles of modeling evaluation are given in this paper. Further, the NIR diffuse reflectance spectrum measurement system and NIR diffuse transmission reflectance spectrum measurement system has been designed based on the main testing methods of NIR spectrum of wheat currently. These systems include optical fiber coupling system and light source. As for the optical fiber coupling system, a discrete collection of double ring structure with distributed is designed based on optical fiber coupled with the use of hierarchical ring samples collected in the form of diffuse light. This optical fiber coupling system is based on the ideas of optical expansion and weighted average sampling distribution. The structure consists of19points of the directive reception and all the19points are divided into two layers distribution ring. Respectively, the angel of the first layer distribution ring with illuminated surface is30°(9points) and the second layer distribution ring with illuminated surface is60°(10Points). Each receiving point is optical fiber collection (In order to increase the receiver solid angle, it can be equipped with a coupling lens). The problems such as the small hole of the integrating sphere which exist the traditional structure sample and the sample attitude which have great impact on test have been solved. For the light source system, long-life halogen cup lamp as a light source is used. Condenser structure is designed to use reflex condenser with pre-collimator lens. The filters and condenser structure are also used to ensure the light is concentrated with good orientation and satisfactory radiation effects.2. For the wheat moisture model, based on the ideas of granular computing (GrC) and the ways of supervised learning, the feature extraction of NIR spectrum of wheat has been successively achieved with wavelet multi-scale decomposition. The representative wavelet coefficients have been selected to reconstruct the spectrum and establish the prediction model. The root mean square error of cross validation (RMSECV) of wheat NIR moisture prediction model decreased from0.4887in raw spectrums to0.2910, decreased by40.5%. It optimizes the model and improves the prediction accuracy of the model greatly.3. Commonly used variable selection methods are introduced. These methods include uninformation variable elimination algorithm (UVE), successive projection algorithm (SPA), and uninformation variables elimination algorithm with successive projection algorithm (UVE-SPA). As for the model for protein in wheat, continuous wavelet transform (CWT) and MSC are adopted to preprocess the raw spectrum. Variable selection results in different variable selection methods are carefully analyzed. The variable selection method based on latent projection graph (LPG) is introduced, and the processing steps are given in detail. Further, the modeling results are discussed in this paper and all the discussions are based on SVM model, CWT-SVM model, CWT-MSC-SVM model, CWT-MSC-UVE-SVM model, CWT-MSC-SPA-SVM model, CWT-MSC-UVE-SPA-SVM model, CWT-LPG-SVM model and CWT-MSC-LPG-SVM model. The relevant modeling evaluation has been given. Within all the models, CWT-MSC-LPG-SVM model works best. The number of variables reduces by90%, the root mean square error of prediction (RMSEP) reduces by34%. Prediction accuracy of wheat protein prediction model is greatly enhanced.4. The idea of Model Population Analysis (MPA) has been elaborated. The MPA is based on the collected samples on the first hand and then used Monte Carlo sampling technique (MCS) to divide them into sub-datasets. In this paper, all the collected93wheat samples are used to establish500sub-datasets by MCS. Further, the sub-models are created for each sub-datasets. The LPG method and PLS method are used in modeling. The500RMSEP of sub-models are concluded. Finally, the discussion has been given in the sample space, variable space, parameter space and model space respectively. Thereby the information of interest could be selected by statistic analysis in these spaces. In this paper, the500root mean square error (RMSE) has been statistical analyzed. There are42sub-models which have large RMSE are deleted. In the remaining458sub-models, the variables selected are used for statistical analysis. There are12variables with high frequency as characteristic variables. Comparing and analyzing the modeling results of different variables selecting methods. Wherein, CWT-MSC-MC-LPG-PLS model based on the idea of MPA creates a95%reduction of variables. Model accuracy is improved by51%. It can be better used in NIR spectral modeling of wheat protein prediction.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络