节点文献

模式识别技术在几种天然产物红外光谱分析中的应用研究

Application of Pattern Recognition Techniques to Analysis of Infrared Spectroscopy of Some Natural Products

【作者】 张勇

【导师】 丛茜;

【作者基本信息】 吉林大学 , 农业机械化工程, 2009, 博士

【摘要】 针对几种天然产物,研究如何将几种模式识别技术,即偏最小二乘、模糊模式识别、人工神经网络、支持向量机及灰度关联分析与红外光谱分析有机地结合以实现定量和定性分析,旨在找到一种更为有效的红外光谱的建模方法,为天然产物的红外光谱分析提供新思路和新技巧。1.以天然产物人参、淫羊藿和烟草为研究对象,提出将模糊模式识别技术应用于红外光谱的定性分析中,解决了其分析过程中光谱变量的降维、贴近度和择近原则以及分析步骤等关键性问题。2.针对天然产物烟草和黄连,研究了偏最小二乘法(PLS)用于近红外光谱的单组分及多组分的定量分析,并确定了光谱的预处理方案及泛化能力的评价指标。3.以天然产物人参、淫羊藿、烟草、黄连为例,研究人工神经网络应用于中红外光谱的产地鉴别分析及近红外光谱的定性和定量分析时的关键参数设置和相关问题的解决方案,并对建立的模型进行了有效评价。4.研究支持向量机技术(SVM)用于近红外光谱定量分析时有关核参数和核函数的选择方法,同时提出结合小波变换技术,利用支持向量机对烟草及黄连近红外光谱的单组分、多组分的定量分析进行了仿真实验,并将不同模式识别技术建立的模型进行细致的对比研究。5.提出将灰色关联分析法用于近红外光谱的谱区优化选取,通过计算某一谱区的峰面积与特定组分的灰色关联度,并将灰色关联度较大的谱区作为特征谱区参与建模,使其建模时间大大缩短,预测精度有较大的提高。

【Abstract】 In recent years, the infrared spectroscopy analysis, as result of its many advantages, that is quick analysis speed, no pollution, not to need special pretreatment, not to use virulent and the harmful reagent, nodestructive, simple operation, the lower analysis cost, green environmental protection and so on, has made the breakthrough progress in quality analysis of some natural products, especially in the traditional Chinese medicine field. The spectrum area of infrared spectroscopy mainly shows the frequency multiplication absorption of stretching vibration in O-H, N-H and C-H key, which is special suitable for quantitative analysis of functional groups in natural products. But the vibration base frequency of the overwhelming majority organic compound appears in the middle infrared spectroscopy, which is more suitable for qualitative analysis of functional groups and structure of natural products.Multiple linear regression(MLR), the principal components analysis (PCA) and partial least squares regression (PLSR) are the traditional chemometric methods in the infrared spectral analysis. However the massive reports indicated that the non-linear relations often present between the target and the spectrum data, so these linear regression technologies certainly cannot obtain the very good predicted accuracy. But pattern recognition technology, because it has the ability of distinguishing the specimen which the specific object imitates through the computer technology, simultaneously has the very good generation, therefore can be used in the choice and extract of spectral characteristics, the classification and prediction of object, simultaneously the quantitative analysis and prediction of specific component through self-learning and regression technology.However, the traditional pattern recognition methods use one pattern characteristic to apply in all sample classes, and does not differentiate to them. When each characteristic is input match module, it carries on directly match and classification. Therefore, when the pattern is not match with symbol, it is very difficult to judge problems part, and the revision algorithms and parameters adjustment own certain blindness.Then people’s experience function is unable to be displayed, it would enhance the recognition rate only through the massive samples learning and unceasing adjusting parameters. Regarding to infrared spectrum, because the complexity and the massive spectrum peak overlap of its spectrum data cause analysis difficulty, its difficulty of pattern recognition is very obvious.If only traditional pattern recognition methods are used, the effective classification may be made with difficuly. What is lucky, after L.A.Zadeh proposed the fuzzy set thought, the fuzzy mathematics method had been introduced in the pattern recognition (i.e. fuzzy pattern recognition). When the recognition system is designed by use of fuzzy technology, it can more widespreadly and thoroughly simulate the thinking process of human brain, then the computer intelligence, the usability and reliability of system can be enhanced.In such cases, the artificial neural network (ANN) has been used with relative success in the spectral analysis because it may willfully approach to the nonlinear function. But ANN suffers critical drawbacks that it easily falls into over-fitting. Simultaneously the ANN model excessively relies on the train sample data, and under the majority situation, the sample data is extremely limited (namely so-called small sample), the prediction ability of ANN model will be weakly. Next, because the spectrum data of samples is usually high-dimensional, it is necessary that the characteristics of the primitive spectrum data must be withdrawn using dimensionality reduction technology for reducing the computation quantity. Otherwise the training time of ANN model would greatly increase, the convergence speed would become very slow, and it couldn’t even converge. Recently, as a new pattern recognition method, support vector machine (SVM) has a good theoretical foundation in statistical learning theory. It has been widely applied in the fields of pattern recognition, the time-series analysis as well as the function approximation and so on. Instead of the traditional statistical theory, SVM mainly aims at the small samples, namely the optimal solution is based on the limited sample information, but not on the information that the number of samples tends to infinity. Moreover, SVM models can avoid over-fitting problem, has the superior generalization ability and prediction accuracy.The research point of this paper lies in: in view of some natural products, it was carried out that the infrared spectroscopy analysis was organically combined with several kinds of pattern recognition technology, namely partial least squares, fuzzy pattern recognition, artificial neural networks, support vector machine and grey correlation analysis, to realize qualitative and quantitative analysis for the purpose of seeking one kind of more effective modeling method of infrared spectroscopy and providing the infrared spectroscopy analysis of natural products with new ideas and skill.Take the natural products, that is ginseng, Epimedium Brevicornum and tobacco, as the objects of study, the fuzzy pattern recognition technology to apply in qualitative analysis of infrared spectroscopy was first proposed. simultaneously, the crucial questions, namely dimension reduction of spectrum variables, closeness degree, principle of choosing the nearest as well as analysis steps and so on, had also solved during the process of analysis. The simulation result indicated that the habitat distinction models can basically correctly distinguish 42 Epimedium Brevicornum samples, 40 ginseng samples, and 120 tobacco samples, which is satisfying. Moreover it can avoid the separation and drawing of natural products with traditional spectroscopy analysis, thus offer the effectively and reliable basis for the quality controls and modernized management of natural products.In view of the natural products, namely tobacco and concocted Coptis, the least squares method was studied to realize quantitative analysis of single component and the multi-components together near-infrared spectroscopy, and the pretreatment plan of spectrum data and evaluating index of generalization had been also determined. The simulation experiment indicated that when the partial least squares method was used in the spectral analysis of natural products, it could meet the practical application needs to a certain extent, but the optimization time was excessively long, it was not to suit the small samples and the generation ability was relatively weak, thus its practical application value was reduced to some extent.Take the natural products, that is ginseng, Epimedium Brevicornum, tobacco and concocted Coptis as the examples, the research of artificial neural networks applied in habitat distinction analysis of middle infrared spectrum and qualitative analysis of near-infrared spectrum had been completed, and the key parameters, the solution of related question and effective appraisal to the models had been also carried on. The simulation results indicated that habitat distinction models, regardless of near-infrared spectroscopy or the middle infrared spectroscopy, their distinction accuracy rates achieve above 92%. Simultaneously, the prediction of quantitative analysis models was quite accurate, each evaluating index of models was ideal. At the same time, it was also found that when the artificial neural networks was appllied in the infrared spectroscopy analysis, really, its generation had certain limitation, and it was easily to fall into local optimal problem.Moreover, when the quantity of samples used in modeling was relatively less, the predictive ability of models were obviously weaken.The wavelet transform technique combined the support vector technology were first proposed to realize the qualitative and quantitative analysis work of middle infrared and near-infrared spectroscopy. Simultaneously, the related nuclear parameters and the method of nuclear function choice were discussed and analyzed, which was simulated in single component, the multi-component quantitative analysis of near-infrared spectroscopy of tobacco and concocted Coptis samples, and habitat distinction analysis of infrared spectroscopy of ginsengs and Epimedium Brevicornum samples. Finally, the models by use of different pattern recognition methods were carefully compared. The contrast result indicated that qualitative and quantitative analysis models based on support vector machines, regardless of the near-infrared spectroscopy or the middle infrared spectroscopy, manifest some advantages, namely the good reliability, the robustness, the best distinction accuracy rate, the highest prediction precision, the strongest generation, the shortest modeling time, the fewer manual controlling factors, the most suitable for the small sample, not easy to fall into local optimal problem and so on.Therefore, the support vector machine owns the high practical application value and the broad application prospect in the infrared spectroscopy analysis field.The grey correlation analysis method was first used in the optimization selection of spectrum regions of near-infrared spectroscopy. Through calculating the peak area of some spectrum region and correlation degree of the specific component, the spectrum regions of maximum correlation degree were took as the optimal spectrum region and participated to establish models. The simulation results showed that the modeling time was greatly reduced and the predicting precision was significantly increased. Therefore, this research owns the high application value.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2009年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络