节点文献

色谱—质谱联用仪数据处理关键技术的研究

Study on Several Key Thechniques of Data Processing in Chromatography-mass Spectrometry

【作者】 蒋学慧

【导师】 汪曣;

【作者基本信息】 天津大学 , 生物医学工程, 2013, 博士

【摘要】 色谱-质谱联用仪器是色谱分离技术和质谱分析技术紧密结合的产物,在环境监测、食品安全、生命科学等领域有着广泛的应用。色谱-联用仪器作为高端分析仪器的一种,其发展与计算机技术密不可分,其数据处理技术的优劣直接影响了分析的效果。色谱-质谱联用仪器主要有两种形式:气相色谱-质谱联用(GC-MS)和液相色谱-质谱联用(LC-MS)。本论文以GC-MS和LC-MS联用仪器的数据分析为研究领域,对于GC-MS,着重研究数据的退卷积算法;对于LC-MS,以目前生命科学分析中最为常用的高效液相色谱-质谱联用(HPLC-MS)为研究对象,以蛋白质分析为着眼点,着重研究了蛋白质肽段保留时间预测模型的建立以及保留时间因子的在蛋白质鉴定中的应用。主要工作内容和创新点包括以下四个方面:1、GC-MS数据折返退卷积算法:在对已有GC-MS数据差分算法分析的基础上,提出了一种色谱峰分离的折返算法,着重讨论了算法中关键要素的选取原则,分析了运算后GC-MS信号对高频噪声的敏感性,通过实验验证了该分离算法的效果。以GC-MS数据的矩阵形式表示退卷积的运算过程,在色谱峰折返分离算法的基础上,提出了一种GC-MS数据的折返退卷积算法,该算法通过矩阵运算实现GC-MS数据中纯净色谱图的提纯,经实验验证该算法可以有效的提取混合物中各个成分的纯净质谱谱图,确定各成分的色谱保留时间。2、基于K-medoids聚类算法的GC-MS数据退卷积算法:当混合物中两种或两种以上被测物质的保留时间相差在一个扫描数内时,传统的GC-MS数据退卷积算法将无能为力,针对这种情况,本论文引入K-medoids聚类算法对GC-MS数据进行处理。K-medoids聚类退卷积算法分为:峰检测、聚类分析和色谱峰形校正三个环节,在峰检测环节提取各个质量色谱图的谱峰;在聚类分离中,对提取的质量色谱峰进行聚类分析以确定混合物中各化合物的碎片离子组成,采用Silhouette指数对聚类结果进行评价;在色谱峰形校正中,对所提取的每一种化和物的色谱峰形进行校正以提取保留时间。对真实的实验数据,采用K-medoids算法和AMDIS算法进行分析,结果证明了K-medoids聚类算法可以有效的分离AMDIS系统无法分离的重叠色谱峰,提取对应化合物的纯净质谱图。3、 LC-MS联用仪器是色谱质谱联用技术的重要形式,蛋白质组学是HPLC-MS的主要应用领域。在蛋白质的鉴定中,肽段的保留时间可提供除质荷比以外的另一维信息,提高鉴定的准确性,因此需要建立蛋白质肽段的保留时间预测模型。本文中模型的建立分为三个阶段:(1)以小样本数据集建立了C18色谱柱条件下以TFA为离子配对试剂时蛋白质肽段保留时间的初级模型。(2)随后采用大样本数据集,分析了肽段长度、氨基酸位置、邻位效应、氨基酸聚簇效应对肽段保留时间的影响,进一步优化了预测模型。(3)半胱氨酸(Cys)的烷化是蛋白质分析中不可或缺的一环,在预测模型中分别修正了碘乙酰胺、碘乙酸、4-乙烯基吡啶、丙烯酰胺和甲基三硫醚五种常用烷化剂修饰下以及未被烷化修饰的半胱氨酸的保留时间因子。4、通过对蛋白质保留时间预测模型中各个氨基酸保留时间因子的分析,提出了一种通过不同酸性离子配对试剂的组合进行二维HPLC分离不同带电荷数肽段的方法,经实验证明在二维HPLC中使用不同酸性离子配对试剂组合可以实现蛋白质不同带电荷数肽段的分组洗脱,基于这一分离作用,提出了一种通过二维HPLC富集蛋白质首末端肽段的方法。以人Jurcat细胞和白蚁梭菌细胞为样品,通过实验验证了该方法可以有效的实现蛋白质的羟基端和羧基端肽段的富集,从而为蛋白质的快速准确鉴定提供了一种新的途径。

【Abstract】 Chromatography-mass spectrometry hyphenated instrument which is a closelyintegrated product of chromatography and mass spectrometry technology is widelyused in environmental monitoring, food safety, life science and many other fields. Itsdevelopment related to computer technology. The data processing technology willdirectly affect the analysis results.There are two main forms in chromatography-mass spectrometry area: Gaschromatography-mass spectrometry (GC-MS) and liquid chromatography-massspectrometry (LC-MS). For GC-MS, this thesis focused on the deconvolutionalgorithm. For LC-MS, it focused on protein analysis, researched the protein peptideretention time prediction model and retention time factors application in theidentification of protein. The main work in this thesis is as follow:1、 Back-folded deconvolution method for GC-MS data analysis. A novelback-folded method for chromatographic peak separation was presented in this thesisbased on the differential algorithm. Based on the back-folded chromatographic peakseparation algorithm and process of deconvolution in matrix form, a newdeconvolution algorithm for GC-MS data was proposed which achieve purificationpure spectrum by matrix operation. The experimental results validated that thisalgorithm can effectively extract pure mass spectra and determine the retention timefor each component in the mixture.2、A novel deconvolution algorithm for GC-MS data based on K-medoidsclustering analysis. When the difference of retention times of two or multiplecompounds is less then one scan, traditional deconvolution processes are unable toextract pure mass spectrum. In view of this situation, the K-medoids clusteringalgorithm was introduced to GC-MS data processing. The data analysis workflowconsists of three sequential steps: peak detection, deconvolution and chromatographicpeak shape correction. The real experimental data was analyzed using the K-medoidsalgorithm and AMDIS system. The results show that K-medoids clustering algorithmcan separated overlapped chromatographic peaks effectively which is out of theability of AMDIS system.3、LC-MS hyphenated instruments is an important form of chromatography-mass spectrometry technology. Proteomics is the main application field of HPLC-MS.In protein identification, retention time of peptides offers the other dimension of information beyond mass-to-charge ratios, could improve protein identificationaccuracy, therefore establishing the prediction model for retention time of proteinpeptides is needed. The establishment of this model is divided into three stages:(1)The primary peptide retention time prediction model using TFA as ionpairing reagent was established based on a small sample data set.(2)Then analyzed the influences of peptide length, amino acid position,neighbor effect, clusters of amino acids, further optimizad the prediction model basedon a larger data set.(3)The alkylation of cysteine is a key element in protein analysis. The retentioncoefficient for cysteine was corrected when it was alkylated by iodoacetamide,iodoacetic acid,4-vinylpyridine, acrylamide and methyl-methanethiosulfonate. Freecysteine was also concerned.4、Through the analysis of retention time coefficient of each amino acid in thethree model, a method for separating the peptides with different charge using differentacidic ion pairing reagent combination in two-dimensional HPLC. The experimentproved that this method could elute peptides with different charge in clusters. Basedon this separation, A novel method for enrichment of protein N terminal and Ctermina peptides was present which could provide a new way for identification ofproteins., experiments using human Jurcat cells and termite Clostridium cells assamples show that the method can effectively enriched N terminal and C terminalpeptides of protein.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2014年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络