节点文献

化学模式识别和多维校正方法及其在复杂体系分析中的应用研究

Chemical Pattern Recognition and Multi-way Calibration Methodologies and Their Applications to Complex System Analysis

【作者】 付海燕

【导师】 吴海龙;

【作者基本信息】 湖南大学 , 分析化学, 2010, 博士

【摘要】 随着现代科学技术的飞速发展,大量新型多通道高阶分析仪器相继问世,以及应用体系日益复杂化,分析化学工作者面对的不再是进行简单的标量或矢量响应数据分析,而是成千上万个数据点组成的基于阵列基础之上的二维、三维甚至四维的化学数据阵分析。这些庞大的数据阵中不仅包含了丰富的有用化学信息,也包含了一些干扰组分响应、背景响应、仪器噪音等,化学计量学理论和方法的不断发展为从这些复杂的数据中提取有用化学信息提供了各种有效的解决方法。化学计量学理论体系中的化学模式识别和多维校正方法是解析这些复杂数据信息的非常重要的两个研究领域,本论文通过对这两个领域的热点和难点的追踪研究,选取了其中几个重要的问题进行方法探索和应用研究,内容主要涉及以下几个方面:1化学模式识别方法与中药近红外光谱的质量控制研究(第二章-第三章)中药材原料质量除了与是否存在掺杂使假的真伪优劣有关之外,其质量特征和药效还与它们的地理区域性密切相关。另外,对于复方中药的质量除了与所采用的中药材原料用药密切相关,还与市场上各个厂家的生产加工用药和工艺流程相关。因此,中药材的真伪优劣、不同道地性药材以及不同厂家的复方中药的判别对中药质量控制和有效监督市场行为具有重要的现实意义。本文通过利用近红外漫反射技术采集不同质量类属的牛黄样品以及不同厂家的复方中药六味地黄丸的近红外红光谱,提出了一种移动口偏最小二乘判别分析(MWPLSDA)模式识别方法对这些不同类属的牛黄样品或六味地黄丸样品的近红外光谱数据进行表征和信息提取,相比传统的主成分分析(PCA)、线性判别分析(LDA)、偏最小二乘判别分析(PLSDA)模式识别方法,该方法通过剔除与分类无关以及干扰分类的无用信息变量的影响,能更有效地处理近红外光谱指纹信息中的非线性和复共线性等复杂的相关关系,从而有助于更好地从差异微小的指纹分类信息中提取出能反映中药材牛黄样本真伪、道地性差别和复方中药六味地黄丸样本厂家来源的判别隐变量,给出更好的模式识别效果。该方法简单快速而又切实可行,可望推广用于判别市面各种中药真伪、产地归属、质量类别等。2一阶校正新算法研究与多元光谱数据分析(第四章)最小二乘支持向量机(LS-SVM)以其优越的性能在对多元光谱的二维数据分析的一阶校正建模中得到越来越广泛的应用,但它的性能在很大的程度上还依赖于数据集分布的均一性和模型误差的同质性,这也是一阶校正算法中存在的普遍性问题。因此,对一阶校正建模中的训练集样品的代表性和最优化样品加权问题进行研究将关系到这类模型的进一步推广应用。但是,由于在多元光谱分析中样品光谱空间的多维性和复杂性以及样品选取过程中的不确定性,使得准确估计训练集样品在整个样品空间的代表性尚存在一定困难。传统的多元校正模型大多根据经验方法选择代表性样品,在某些不利的情况下可能会影响校正模型对新样品的预测性能。为解决以上问题,同时考虑到样品的代表性很难通过考察单个样品进行估计以及多元校正模型本身还存在模型最佳参数的选择与确定,我们把样样品加权和最小二乘支持向量机相结合,基于粒子群优化算法,提出了一种自适应训练最优化样品加权最小二乘支持向量新算法(OSWLS-SVM)并用于多元光谱分析。该算法通过对原来的训练集样品进行非负加权,在校正建模过程中同时考虑了模型的复杂性和预测能力,以同时优化原始校正集样品的训练和独立验证集样品的预测为训练目标,采用粒子群优化算法实现对样品代表性的某种最优化重新刻度的同时,也能实现模型超验参数的优化。将该新算法用于真实的多元光谱二维数据集的校正分析结果表明,原始校正样品的代表性较差时,模型的预测性能确实能得以改善。另外对基于粒子群优化算法搜索性能的稳定性和有效性进行了考察,结果也表明,新算法非常稳定有效。新算法为二维数据分析校正问题提供了一种新的全自动建模方法,且构建思想具有通用性,因此还能为改进其它一阶校正方法提供一种参考。3二阶校正方法与三维荧光技术用于直接或间接定量分析体液中药物含量(第五章-第六章)复杂体液中药物的快速分析是现代生命医学领域面临的一个重要的问题,传统方法采用色谱分离技术来实现这一目的,通常情况下通过调整色谱柱或者色谱条件分离,但是由于体液中含有不可预测的基质及其他干扰物,在某些情况下往往难以实现完全分离,并且对于萃取分离条件的考察也十分耗时耗材。因此,考虑到荧光检测技术具有高灵敏度、高选择性、费用低等优点,以及化学计量学二阶校正相比一阶校正具有二阶优势:即使有未知干扰共存也能对感兴趣组分进行定量分析,我们利用二阶校正方法与三维激发-发射荧光光谱相结合,运用平行因子(PARAFAC)和交替归一加权残差(ANWE)算法对三维荧光数据进行解析,在不用对干扰物和背景物加以校正建模的情况下,以数学分离代替化学分离,再基于标样的已知浓度,利用简单回归法实现了抗癌药物伊立替康的血药浓度和其在人尿液中浓度的直接有选择性的定量分析。该方法快速简便,花费成本低廉,定量结果满意。有潜力成为一种临床监控人体液中伊立替康含量的快速、灵敏分析方法。另外,为了将荧光检测技术与二阶校正方法所体现的优势继续发挥在体液中弱荧光药物和无荧光药物的量化分析,我们通过胶束增敏增强三维荧光策略与基于PARAFAC和ANWE的二阶校正方法相结合,增强了荧光分析方法的选择性,且仅需简单的样品预处理就实现了人血浆和同源药物多潘立酮干扰共存下的甲氧氯普胺精确的光谱分辨和准确的浓度预测。此外,还提出了一种新的对无荧光药物美洛昔康在未知干扰共存下的人体尿液中的定量分析方法。该方法基于无荧光的美洛昔康通过氧化衍生水解反应后生成强荧光化合物这一特性,仍然采用PARAFAC和ANWE这两种二阶校正方法对衍生三维荧光进行数学分离,再通过间接校正目标分析物美洛昔康,获得了满意的定量结果。4三阶校正新算法研究及优势探索(第七章-第八章)当对四线性成分模型的四维数据进行定量预测分析时,这一方法可被称作三阶校正,其具有的优势可称三阶优势。理论上,三阶校正的优势不仅包括二阶优势,应该还会包括更多的优势。针对二阶校正在解决某些复杂体系分析中存在的不足,以及为适应将来更多复杂体系数据解析或者高阶联用仪器的高阶数据分析,研究三阶校正方法相当重要。然而,迄今为止,仅仅很少的文献报道过对四维数据的分析。作者在通过分析文献中仅有的少数几个四维算法的基础上,总结了三阶校正算法开发的总体思路,并首次基于四线性模型的不完全扩展的矩阵形式,开发了两种三阶校正新算法。针对基于完全扩展矩阵形式的四维平行因子(fourway-PARAFAC)对拟合模型需要的化学秩敏感、收敛速度慢、容易出现衰退解等不足,以及基于切片矩阵形式开发的交替惩罚四线性分解算法(APQLD)对噪声容忍能力弱的缺点,本文第七章中,开发了一种基于不完全扩展矩阵形式的交替加权残差约束四线性分解(AWRCQLD)新算法。该算法构建四个独特的加权残差函数作为四线性模型损失函数的约束项,以交替最小化四个新的加权残差目标函数求解四维数据阵中四个潜在的矩阵。通过测试模拟的三维、四维数阵以及设计真实实验以信息总量相等的三维和四维数阵进行解析对比,对应用最为广泛的几种二阶校正方法以及基于新算法AWRCQLD、fourway-PARAFAC和APQLD的三阶校正方法的结果进行比较,探索了三阶校正相比二阶校正所具有的三阶优势,挖掘了新算法相比fourway-PARAFAC和APQLD两种四维算法所具有的优越性。研究表明,四维阵并不是三维阵的简单集合,各维间有着其独特的内在联系。三阶校正能更充分的提取数据隐含信息和提高分辨能力,能解决二阶校正无法正确解析的高度共线性病态模拟数据,也能解决二阶校正无法对血浆中某些内源干扰物质的荧光光谱与目标分析物奥沙普秦的光谱存在严重重叠且高度相似时的合理分辨和正确校正的问题。此外,新算法AWRCQLD克服了两种已有四维算法的不足,与fourway-PARAFAC算法相比,新算法不仅收敛速度快而且对组分数不敏感。与APQLD算法相比,新算法对噪声有很强的容忍能力,稳健性令人满意。新算法的显著特点就是当体系噪声极大或共线性极强时,且组分数过剩时,分辨得到的感兴趣分析物的图谱都十分稳定,并且具有较快的收敛速度。该算法极有潜力成为研究复杂化学体系和过程中的四维数据分析的一种普适性算法。二阶优势的发现,曾极大地推动了化学计量学的长足发展。我们也可以大胆预测,三阶优势也可能会在更多的实际应用中展露锋芒。为了能为四维数据分析提供更多可选择的方法,在本文第八章中,我们仍然从不完全扩展形式的四线性模型出发,开发一种自约束交替四线性分解(SRAQLD)新算法。该算法设计了具有密切内在联系的四个目标函数,其每个目标函数的两个部分之间互为平衡约束,以交替最小化四个个自约束的残差求解四维数据阵中四个潜在的矩阵。在实验中,由于氯丙嗪与血浆存在荧光基体效应,而导致二阶校正和二阶标准加入法对血浆中氯丙嗪的量化分析均出现不同程度的预测偏差。为了改善结果,通过增加血浆背景量的变化这一维信息,构建激发-发射-背景体积量-样本的荧光响应四维数阵。并将four-way PARAFAC、APQLD与新提出的AWRCQLD和SRAQLD这四种三阶校正方法对该体系进行研究。结果显示,利用三阶校正方法可提取药物中氯丙嗪在不同血浆背景量下隐含的作用趋势,使这种隐含信息不会被错误关联在分辨得到的光谱轮廓与相对浓度中,从而能够在一定程度上克服体系中存在的基体效应,研究给出较为满意的结果。另外,SRAQLD新算法在这一应用体系中表现稍优,新算法AWRCQLD和四维PARAFAC的预测性能相当,APQLD的表现稍弱。本文所开发的两种三阶校正新算法可为更多实际体系和高阶仪器的四维数阵体系分析提供解析工具,有利于人们更深入地认知三阶校正,也为后续的理论研究提供一定的理论基础和参考价值。

【Abstract】 With the rapid development of modern science and technology, a large number of new multi-channel even high-order analytical instruments are emerging, The application system become more and more complex, analytical chemistry researchers are faced with many problems such as two-dimensional, three-dimensional and even four-dimensional data arrays rather than just simple scalar or vector response data analysis. Development of chemometrics theoretic can provide various efficaciously methods for extracting useful chemical information from these complexity data arrays, thereinto, chemical pattern recognition and multi-way calibration are two important research areas. The research work in this thesis mainly focuses on these methodologies and their applications for two-way data, three-way data analysis and four-way data analysis of complex chemical systems. Study works presented in the thesis primarily deal with the following aspects:1 Chemical pattern recognition methods and near infrared spectroscopy for quality analysis and discrimination of Chinese medicines (Chapter 2 to Chapter 3):The qualitay of traditional Chinese herbs are not only closely related to geographical origins but also affected by adulteration, while the quality of Chinese traditional medicine preparation processed are closely related to not only the raw Chinese herbs but also the technological process of the medicine production in various manufacturers. Therefore, discrimination of genuineness, geographical origins and production manufactures of Chinese herbs is a significant aspect of the quality control of traditional chinese medicine, where chemical pattern recognition methods are frequently used to extract relevant information from near infrared reflectance (NIR) spectral data and provide an alternative criterion for identification and quality control of traditional chinese medicine. Move windowns partial least square discriminate anaysis (MWPLSDA) is proposed to treat fingerprints of different kinds of bezoar samples and Liuwei Dihuang Pills from different manufacturers by NIR spectroscopy. The results demonstrate that MWPLSDA is superior to some conventional linear pattern recognition methods including PCA, LDA and MWPLSDA, and it could remove non-class-related wavelength regions and other uninformative non-composition-related factors, hereby is more effective to treat NIR spectral fingerprint information of these samples. MWPLSDA is a feasible and promising method for quality control and discrimination of traditional Chinese medicine.2 First-order calibration: new algorithm and multivariate spectal anaysis (Chapter 4):Least squares-support vector machine (LS-SVM) has been introduced into first-order calibration by some investigators for its attractive features and promising empirical performance. However, the performance of LS-SVM is strongly dependent upon the uniformity of the training data and the homogeneity of the model errors. To ensure the applicability of the developed model to unknown samples, the representation of training samples and the concept of weighted sampling for multivariate specral anaysis are considered. However, due to the high-dimensionality and complexity of spectral data space and the uncertainty involved in sampling process, the representation of training samples in the whole sample space is difficult to evaluate and select representative training samples for multivariate calibration, which depends largely on experiential methods. If the training samples fail to represent the sample space, the predictions of new samples can be degraded. In order to solve this problem, an intelligentized algorithm called optimized sample-weighted LS-SVM (OSWLS-SVM) by incorporating weighted sampling into LS-SVM is suggested to solve the problem of the representation of samples. PSO is used to search for multiple parameters of OSWLS-SVM model including the best non-negative sample-weighting vectors as an optimized rescaling of the samples in certain sense and LS-SVM hyper-parameters to simultaneously optimize calibration of the original training set and prediction of an independent validation set. Three real data sets are investigated and the results demonstrated that OSWLS-SVM models can improve the ability of prediction for a model when the representation of original calibration samples is poor. Moreover, the stability and efficiency of OSWLS-SVM is also surveyed, the results revealed that the proposed method could obtain desirable results within moderate PSO cycles. The overall conclusion is that OSWLS-SVM is a promising multivariate calibration method for more practical applications, especially when the data may encounter some factors including non-uniformly distributed samples, heteroscedastic noises and so on. The algorithm is universal, so it also can be used to improve the other first-order calibration algorithms.3 Second-order calibration and three-dimensional fluorescence for direct or indirect quantification of drugs in body fluids (Chapter 5 to Chapter 6):Drug analysis in body fluids is an important one in biomedical field. Chromatographic separation techniques are usually used in drug analysis. However, HPLC inherently suffer from tedious pretreation and optimization of the separate conditions, even possible lower recoveries owing to the greater loss of sample during the more intensive extraction and clean-up. An alternative strategy for simple, rapid and sensitive quantification of drug in biological body fluids is excitation-emission matrix (EEM) fluorescence with the aide of second-order calibration methodologies based on PARAFAC and ANWE. EEM data of CPT-11 in biological matrixs shows a trilinear structure that were mathematically decomposed to make serious overlapped peaks into their pure spectral profiles and concentration profiles with the aid of second-order advantage based on second-order calibration algorithms. The method was presented with the potential advantages of rapid, green and low cost for the highly sensitive quantification of CPT-11 in human body fluid samples. Moreover, the second-order calibration methods combined with the excitation-emission matrix fluorescence based on enhance or derivatization were developed to analyze metoclopramide with isogenous interferer in plasma samples, and meloxicam in human urine samples. This is an attractive alternative strategy for indirectly determine other the kinds of medicines without fluorescence in more complex system.4 New third-order calibration algorithms (Chapter7 to Chapter 8):The use of higher-order data for the resolution of complex analytical problems will increase in the near future, due to the associated advantages. This calls for appropriate chemometric methods for calibrating with data structures of higher complexity. Four-way data arrays are used to construct quantitative calibration models only in a few cases, because the related theories on four-way calibration analysis are still immature, it is very significative to further study more related theories and applications for exploring advantages of four-way calibration analysis.A novel third-order calibration algorithm, Alternating Weighted Residue Constraint Quadrilinear Decomposition (AWRCQLD) algorithm is developed to analyze four-way data arrays. To our knowledge, not-fully stretched matrix forms of quardilinear model were first employed to design quadrilinear decomposition algorithm. the AWRCQLD algorithm is based on the new scheme that introduces the four unique weighted residual functions as constrain parts to fit loss fuction of quardilinear model. Simulation and the experimental data are arranged and analyzed in both forms of three-way data arrays and four-way data arrays to explore additional third-order advantage differing from second-order advantage. Meanwhile, performance of the proposed algorithm has been compared with that of four-way PARAFAC and APQLD. The results demonstrated third-order advantages lie in that more inherent information can be obtained from data for improving the resolved and quantitative capability of trilinear second-order calibration especially in serious high collinear system. It means that four-way data array is not a simple collection of three-way data array, it have unique internal relations in each three-way data arrays. Moreover, compared with four-way PARAFAC, the new algorithm AWRCQLD has the advantages of fast convergence and being insensitive to the excess component number used in the model. Compared with APQLD, AWRCQLD has better capability of anti-highnoise. This predominant features will facilitate the analysis of four-way data arrays.Discovery of second-order advantage, it great impeled development of chemometrics. Let’s make a prediction, third-order advantage would eventually emerge in more paractical application. This calls for more chemometric methods for calibrating with data structures of higher complexity. Based on not fully stretched matrix form of the quadrilinear model, a new algorithm called self restrain alternating quadrilinear decomposition (SRAQLD) algorithm is developed to exploit the solution. The objective functions with strong intrinsic relationship were constructed, where the two loss functionS in the every objective function are optimized and constrained mutually. A new approach to quantification even in presence of a matrix effect caused by the background. The excitation-emission matrix (EEM) fluorescence determination of chlorpromazine in a hunman plasma used to show the ineffectiveness of second-order calibration and the second-order standard addition method in these conditions in spite of the trilinear model fitted with the experimental tensor. Therefore, it is the alteration of the fluorescence quantum yield by changing the levels of matrix plasma quantity providing a four-way tensor which let us the determination of chlorpromazine concentration. The four third-order calibration methods based on PARAFAC, APQLD, AWRCQLD and SRAOLD, respectively, are tested with suitable results, which are slight better for proposed SRAQLD in this system. Moreover, proposed AWRCQLD can be expected to play its advantage in other system with low-signal rate. Undoubtedly, more practical study should be implemented to continually explore and recognize the third-order calibration. In adition, the research work provided the theory and reference for studying more new thid-order calibration methods in the further.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2012年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络