节点文献

共识融合策略在光谱检测中的应用研究

Research on Application of Consensus Fusion Strategy in Spectroscopy Detection

【作者】 毛飞

【导师】 袁雷明;

【作者基本信息】 温州大学 , 计算机应用技术, 2021, 硕士

【摘要】 随着现代光谱检测技术的迅猛发展,近红外、中红外和拉曼等光谱检测技术已经广泛应用于工业、农业、食品和医药等领域,但由于实际生产中采集的高维光谱数据中往往包含扰动噪声,常规的分析方法对高维光谱进行分析挖掘很难获得理想的效果,因此需要设计一种新的建模策略用于高维光谱数据的分析与挖掘。目前,常用于光谱建模分析的经典算法主要包含主成分回归和偏最小二乘等算法(Partial Least Square,PLS)等单模型,虽然PLS模型广泛用于光谱数据的分析与挖掘,但在处理高维光谱数据时,容易出现有用信息丢失等问题,不利于高效快速地分析和挖掘高维光谱数据中隐藏的信息。为了解决单模型中存在的一些不足和问题,本文在特征变量选择算法与无监督聚类算法(自组织映射算法)的基础上采用共识融合策略,分别构建基于残余信息的共识融合模型、基于连续CARS-PLS的共识融合模型、基于多特征的共识融合模型和基于自组织映射算法的无监督共识融合模型用于高维光谱数据的建模分析,其中连续CARS-PLS共识融合模型在杨梅(近红外光谱),云和雪梨(近红外光谱)和甲醇汽油(中红外光谱)数据上均取得最佳的预测性能,相较于常规的PLS模型其在训练集和预测集上分别提升了15.3%、11.1%、15.1%和14.6%、9.5%、10.3%;而在甲醇汽油(拉曼光谱)数据上基于残余信息的共识融合模型的预测性能最佳,在训练集和预测集上相较于常规的PLS模型分别提升了9.2%和11.9%。本文的主要内容如下:1.根据常规的单模型建模策略,在特征变量选择算法和无监督聚类算法的基础上采用共识融合策略对传统的单模型进行改进,建立四种共识融合模型用于高维光谱数据的建模分析。2.为了检验上述改进的四种共识融合模型的建模效果,本次研究将实际生产中采集的杨梅的近红外光谱、云和雪梨的近红外光谱以及甲醇汽油的中红外光谱和拉曼光谱数据等四种光谱数据作为研究对象,并将PLS模型和特征变量选择模型等作为基础参照模型。实验结果表明在对高维光谱数据建模分析时,采用共识融合策略相较于典型模型PLS模型的预测性能有不同程度地提升,且共识融合策略可以增强鲁棒性,降低模型过拟合的风险,可以有效的实现对高维光谱数据中隐含的高价值特征变量信息的分析与挖掘。

【Abstract】 With the rapid development of modern spectroscopy detection technology,near-infrared,mid-infrared and Raman spectroscopy detection technologies have been widely used in industries,agriculture,food,medicine and other fields,However,the high-dimensional infrared /Raman spectroscopy data collected in actual production often contains disturbance noise,it is difficult to obtain ideal prediction results by using conventional modeling strategy.Therefore,it is necessary to design a new modeling strategy for the analysis and mining of high-dimensional spectral data.At present,the classical algorithms commonly used in spectral modeling and analysis mainly include principal component regression and partial least squares(PLS)algorithm,etc.although the univocal quantitative analysis model constructed by PLS is widely used in spectral data mining and analysis,it is often prone to over fitting or complex modeling in the face of high-dimensional spectral data,which is not conducive to Mining and analysis of high-value feature variable information hidden in high-dimensional spectral data.This thesis adopts the modeling idea of consensus fusion strategy on the basis of feature variable selection algorithm and unsupervised clustering algorithm(self-organizing mapping algorithm),and constructs consensus fusion model based on residual information,the continuous CARS-PLS consensus fusion model,the multivariate consensus fusion model and the consensus fusion model of the SOM unsupervised clustering algorithm are used in the modeling and analysis of high-dimensional spectral data.Among them,the continuous CARS-PLS consensus fusion model has the best prediction performance on the data of Bayberry(near infrared spectrum),‘Yunhe’ pears(near infrared spectrum)and methanol gasoline(mid infrared spectrum),compared with the conventional PLS model,it improves the training set and prediction set by 15.3%,11.1%,15.1% and 14.6%,9.5%,10.3% respectively;while the consensus fusion model based on residual information has the best prediction performance on the methanol gasoline(Raman spectrum)data.which is 9.2% and 11.9% higher than the conventional PLS model in the training set and prediction set,respectively.The main research contents are as follows:First,according to the basic modeling strategy of the traditional univariate model,the consensus fusion strategy is used to improve the traditional univocal model based on the feature variable selection algorithm and the unsupervised clustering algorithm,and four main consensus fusion models are established for high-dimensional spectral data.Second,in order to test the modeling effect of the four improved consensus fusion models,the visible near infrared spectroscopy of‘Yunhe’ pears,the near infrared spectroscopy of bayberry,and the mid infrared spectroscopy and Raman spectroscopy of methanol gasoline collected in the actual production are used as the research objects.The traditional PLS model and multivariable feature selection model are introduced as basic reference models.The experimental results show that in the modeling and analysis of high-dimensional spectral data,the consensus fusion strategy has improved the prediction performance of the PLS model to some extent,and the consensus fusion strategy can enhance the robustness and reduce the model’s overfitting,thereby realizing the analysis and mining of the high-value feature variable information hidden in the high-dimensional spectral data.

  • 【网络出版投稿人】 温州大学
  • 【网络出版年期】2022年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络