节点文献

基于生物质谱的糖蛋白组学新技术新方法研究

【作者】 陈瑶函

【导师】 杨芃原;

【作者基本信息】 复旦大学 , 化学生物学, 2011, 博士

【摘要】 本论文研究内容涉及化学学科的分析化学、化学生物学,是属于交叉学科的研究。研究主要通过对糖基化相关分离富集技术、生物质谱技术和生物信息学等技术方法的运用和创新,以准确、高效的揭示重要样本的糖基化修饰,包括糖基化位点和糖链信息。鉴于蛋白质糖基化翻译后修饰的重要性,糖蛋白质组学成为后蛋白质组学研究的热门领域。近些年来,随着与糖基化相关的分离富集技术、生物质谱技术等的发展和进步,越来越多具有重要意义的生物样本中的糖基化修饰现象被揭示和深入研究。尽管如此,现阶段,在与糖基化研究相关的很多方面,方法学上的探索和发展还在不断继续。例如,分离富集技术还处于不断优化、完善,甚至发掘更优途径的阶段;另外,生物质谱技术也正被全方面挖掘可以传达出来的信息,以用于糖基化的解析。日益增加的糖基化研究相关数据的产出,以及人工解析糖基化的难度,也催生了生物信息学手段被不断开发、应用到这一领域,以实现准确的、高通量的数据解读,为更深入的糖基化研究提供支持。尽管如此,由于糖基化修饰的复杂性,使这一领域还有很多的机遇和挑战,仍然有很多亟待探索的重要的生物样本,完整糖肽的分析仍处于起步阶段等。本论文工作主要基于生物质谱技术的优势,联合并改善糖基化相关的分离富集技术,致力于为糖蛋白质组学领域提供重要、独特的数据,以及建立具高效性和实用性的糖基化解析平台。本论文工作的主要贡献是:(1)首次揭示了正常人肝的N-糖基化修饰谱,获得了迄今最大的正常人肝N-糖基化蛋白/位点的数据集,是相关后续研究的重要参考和生物信息学解读的数据基础:研究共获得了915个N-糖蛋白及其包含的1,786个糖基化位点信息;是正常人肝蛋白表达谱的有利补充,提供了382个新增蛋白;基于该数据集,通过生物信息学解读获得了丰富信息。(2)建立了处于国际领先水平的、高通量的完整糖基化肽段解析平台(GRIP):应用精简的、基于实验的糖肽库而非巨大的糖肽理论库作为软件检索的基础,通过严格的质量控制,在实际样品鉴定中获得出色的效果;目前已实现血清样本中上千条N-糖肽的解析,并充分揭示了糖基化的微观不均一性,不但在高通量完整糖肽的研究领域处于前沿地位,而且有望成为血清糖基化检测的有力工具。(3)新颖的将串联多酶切技术与糖基化富集技术联合,建立了改善传统糖基化实验路线不足的新策略:策略之一,可改善高特异性富集下仅依赖去糖基化肽段实现蛋白、位点的鉴定,而使糖蛋白序列覆盖度较低的问题,以及增加糖基化位点鉴定的可靠度;策略之二,有别于传统完整糖肽分析中糖蛋白/位点鉴定与糖肽解析分流程进行的路线,可仅使用一个流程、一步到位的实现蛋白鉴定和糖肽解析。(4)首次将几种基于p-消除反应原理的O-糖基化研究方法比较、应用于实际样本血清:不但获得了初具规模的O-糖基化数据集,而且方法间表现出的低互补性等重要特征,为相关方法学的应用和发展提共了有利的参考和指导。论文一共分为以下四章进行阐述:第一章绪论:阐述糖基化蛋白的重要性,并介绍糖蛋白质组研究相关领域的发展现状。作为最主要和普遍的蛋白质翻译后修饰之一,糖基化的作用涉及生理、病理等众多方面。糖蛋白数量变化或糖链结构的改变,都可能导致疾病发生。不但目前已知的很多疾病的诊断标志物是糖蛋白,而且通过国际标准认证的药物中,糖蛋白也占到较高的比例。糖蛋白主要有四类(N-连接、O-连接、GPI锚定和C-连接),研究最多的是其中的N-连接糖基化和O-连接糖基化,这也是我们目前研究工作主要关注的类型。糖基化的研究虽然至今仍是个难题,但在各种技术发展的推动下已有长足进步。例如,糖基化分离富集技术一定程度上改善了高丰度非糖基化蛋白/肽段对糖基化目标物的掩盖和抑制;各种特异性或非特异性的蛋白水解酶、糖苷水解酶的运用使对于糖基化位点、糖链的研究更为便利;生物质谱技术的发展使糖基化研究的规模化日趋成型等等。即便如此,由于糖基化的复杂性,各种技术的发展和优化仍是该领域的重点和热点。同时,大量糖基化数据,特别是大量质谱数据的产出,也催生了相关的生物信息学分析工具的发展。各种工具,不论是针对纯糖链分析的,还是可用于糖肽分析的,尽管一定程度上解决了人工分析遇到的问题,但是仍然有很大局限性,例如对样本纯度、规模等的要求都十分有限。总体来讲,蛋白糖基化修饰的研究还处于发展阶段,充满机遇和挑战。第二章高通量N-糖基化位点研究:分为两个章节(1)正常人肝蛋白质N-糖基化位点谱研究,以及(2)串联双酶解、高特异性N-糖基化富集鉴定新策略。(1)作为人体最大的器官和主要的代谢器官,肝脏的功能及相关疾病的研究一直都备受关注。至2010年,中国人肝蛋白质组计划(CNHLPP)已获得了六千多个蛋白的表达信息。然而,由于生物样本的复杂程度高、蛋白丰度分布广,所以时至今日如何扩大蛋白鉴定规模仍然是研究人员致力于解决的一个问题。将针对糖基化后修饰的分离富集作为一种降低样本复杂程度的方式,不但可以针对性的获得糖基化蛋白的相关信息,也有利于表达谱蛋白更全面的检测。我们的工作联合了不同分离富集技术(肼腙反应法、亲水相互作用法)和串级质谱碎裂技术(碰撞诱导解离、电子转移解离),获得了大规模的正常人肝N-糖基化蛋白/位点数据集:915个N-糖蛋白及其包含的1,786个糖基化位点;382个蛋白是原正常人肝蛋白表达谱中未鉴定到的,其中更有120个蛋白为肝脏组织特异性表达的。通过大量生物信息学分析,该数据集提供/体现出糖基化在正常人肝中的重要规律和特征,是后续深入研究重要的数据基础。(2)高通量的糖基化位点鉴定,特别是N-糖基化位点鉴定,也已在分离富集技术以及糖苷内切酶(如PNGase F)的帮助下成为常规流程。然而,由于高特异性的富集(例如迄今最为稳定表现出高特异性的肼腙化学反应法),使获得鉴定的糖蛋白序列覆盖度偏低,有的甚至只有一条肽段覆盖;另外,仅基于PNGase F酶切产生的糖基化位点Asn分子量变化(+0.98 Da)作为糖基化发生的证据也有不尽人意之处,例如该分子量变化较小、Asn可能发生自然水解等产生的假阳性。为了改善这些问题,我们统计并根据人类蛋白数据库中的各种氨基酸分布频率,以及蛋白酶的不同水解特异性,将两种蛋白水解酶Lys-C和trypsin串联使用,新颖的引入肼腙反应法对糖肽的富集和鉴定中。该策略在通过标准蛋白的测试后,成功用于实际样本细胞全蛋白裂解液。结果表明,在未牺牲富集方法高特异性的情况下,糖蛋白鉴定的序列覆盖度平均增加了79.4%,约1/3的蛋白序列覆盖度增加了一倍以上,有的增幅甚至高达350%。第三章高通量N-糖基化肽段研究:分为两个章节(1)一步到位(one-pipeline)糖蛋白鉴定糖肽解析新策略,以及(2)高通量血清糖肽鉴定平台(GRIP)的建立。(1)糖肽的解析,往往依赖于前期的分离富集。然而传统方式上,在经过肽段水平的富集后,糖蛋白的鉴定和糖肽分析需要分为两个流程进行:一个流程进行去糖基化处理,以实现蛋白的质谱鉴定(因为糖基化后修饰状态目前无法实现数据库检索鉴定);另一个流程保留糖链完整进行结构分析。我们认为这些目的通过一个流程即可达到,而且能获得附加信息。根据人类蛋白数据库中的各种氨基酸分布频率统计,以及蛋白酶的不同水解特异性,我们将两次酶解(Lys-C和trypsin酶解)与两个水平上的糖基化富集(蛋白水平和肽段水平)进行了穿插联合,使即使进行了肽段水平上的富集,最终用于质谱分析的混合物中既保留了完整糖肽,又包含了存在和数量都合理的非糖肽可用于蛋白鉴定,一举两得。该策略在通过标准蛋白的测试后,成功用于经一维凝胶电泳分离的血清样本条带中,结果通过55条非糖肽鉴定到23个糖基化蛋白,同时解析出25条的完整糖肽,其中包括2条O-糖肽(均为非冗余统计)。该策略不但实现了目前来讲数目可观的复杂样本的糖肽解析,而且末端唾液酸化、核心岩藻糖化、bisecting乙酰葡糖胺化等重要的糖链特征结构信息均被保留和揭示;糖基化微观不均一性也被准确揭示。(2)高通量的完整糖肽解析是目前糖基化研究中亟待攻克的一个难题。由于糖基化的复杂性高、存在丰度低等问题,以及糖肽异于一般肽段的特殊的物化性质,使高通量质谱分析或者获得的数据复杂性大,或者无法通过单一的方式取得足够信息;另外,目前仍没有一种针对完整糖肽的富集方法,能高效的去除非糖肽的影响。这都对糖肽的解析造成难度。目前瞄准糖肽解析的应用程序已有开发,但是绝大多数或者对样本复杂性的耐受度很低,或者给出使用者难以决断的众多结果,能被用户方便、规模化使用的平台十分欠缺。GRIP是我们建立的糖肽解析平台Glycopeptide Revealing and Interpretation Platform的简称,该平台软件的检索基于实验产生的、来自目标样本的肽段库和糖链库生成的糖肽库,而非巨大的理论糖肽库;这使得该糖肽库不但适于具体样本,可随样本而变,而且减少了理论库检索的假阳性。另外,我们充分利用糖肽在质谱中的碎裂特征,将这些特征作为谱图筛选及假阳性控制的重要手段,准确的从海量谱图中筛选出糖肽谱图并进行解析。该研究目前已实现血清样本中上千条非冗余糖肽的解析,结果中展示了蛋白糖基化微观不均一性等重要信息,不但在国际上处于领先水平,也具有很好的实际应用前景。第四章基于β-消除反应的O-糖基化研究:比较和使用了现有的几种beta-消除反应在实际样品中的应用效果,为相关技术的实际应用和发展研究提供了有利的参考和指导。虽然分离富集方法以及广谱性糖苷内切酶的使用,使得N-糖基化位点的鉴定常规化,但对于O-糖基化的研究,由于其核心结构众多,且没有广谱性的糖苷水解酶,因而常规化、规模化的O-糖基化位点鉴定至今仍是个难题。多数以实现该目标为目的的研究多采用基于β-消除/加成反应的策略:消除反应实现去糖基化,Ser/Thr侧链形成不饱和键;加成反应为原糖基化位点带上利于后续分析、处理的质量或富集标签。尽管有研究对这类方法进行了改善和使用,但范围往往限于标准肽段等简单样本。我们首次平行、比较使用了DTT加成法、氨水法、甲胺法等三种处理方法,不但考察了它们对同样发生在Ser/Thr氨基酸上的磷酸化后修饰的影响,更将其应用到实际血清样本中,获得了来自96个蛋白的157个O-糖基化位点,但其中只有4个位点为一种以上方法共同鉴定。我们对这些位点进行分析还发现修饰发生在Ser氨基酸的比例高于Thr氨基酸(61% vs 39%);同一蛋白中的糖基化位点常出现在邻近区域,与文献记载相符。另外,我们还比较了还原性β-消除反应法在蛋白或肽段水平上进行时,对于O-糖链解析效果的差异。结果表明,前者有利于低分子量糖链的检测,而后者在高分子量糖链的检测上具优势,两者联合能更全面的实现糖链解析。

【Abstract】 This thesis presents an interdisciplinary research involved in analytical chemistry and chemical biology. Glycosylation enrichment methods, mass spectrometry techniques and bioinformatics tools have been combined, for the goal revealing glycosylation information in important biological samples accurately and efficiently.Glycosylation is one of the most important and universal protein post-translational modifications (PTMs), while glycoproteomics is becoming a significant field in the post-proteomics era. The development of enrichment methods and bio-mass spectrometry (MS) techniques has been facilitating glycoproteomic researches; glycosylation information has been revealed from various samples. Nevertheless, there is always room for methodological development in glycoproteomics, e.g. development and improvement of glycan-related enrichment methods and MS techniques. Meanwhile, with the increasing output of MS data, efforts from bioinformatics have been made on glycoproteomics, to assist or even replace time-consuming manual interpretation. However, because of the complexity of protein glycosylation, it is still full of chances and challenges in this field:many important samples need interpretation, and automatic analysis of intact glycopeptides is at the initial stage.The major contributions of this work are as follows:(1) N-glycoproteins of normal human liver have been profiled for the first time, and the large dataset, which is not only a valuable supplement for normal human liver proteome, but also the basis of meaningful bioinformatic analysis.(2) An internationally advanced platform for intact glycopeptide interpretation and revealing (GRIP) has been established; it newly introduced experiment-based de-glycopeptide and glycan datasets to form a sample-friendly glycopeptide database, which have achieved high-throughput glycopeptide analysis for human serum.(3) Two strategies novelly combining sequential multi-enzymatic approach and glycan-related enriching method have been developed, assisting glycoprotein sequence coverage improvement and accurate glycosite/glycopeptide identification.(4) First comparison of differentβ-elimination-based methods for O-glycosylation study on practical sample has been carried out; features of our results could be useful references for the application and development of the methodology.This thesis consists of four parts which are summarized as follows:Part 1. Introduction:a brief and comprehensive introduction of protein glycosylation and glycoproteomics-related fields.As one of the most important and universal protein PTMs, glycosylation takes part in many physiological processes. The alteration of glycoprotein amount and glycan structure probably indicates pathological changes, while many FDA approved drugs are glycoproteins. There are four major types of protein glycosylation; the two mostly studied are N-linked and O-linked glycosylation. Although it is still a tough job, glycoproteomics has been largely facilitated by the development of different technologies, e.g. glycan-related enrichment methods, MS techniques. Meanwhile, bioinformatics on glycoproteomics is another related field rapidly developing but just in the start. Generally, glycosylation study now is still full of chances and challenges.Part 2. High-throughput N-glycosite Studies:two sections included (1) normal human liver N-glycoproteome profiling; (2) sequential multi-enzymatic assisted and high specific glycoprotein/site identification.(1) As the largest human organ, liver plays vital roles especially in metabolism. Biological functions and related disease of liver have always been paid close attention to. Till 2010, Chinese human liver proteome project (CNHLPP) has identified over six thousand proteins from normal human liver. However, because of the high complexity and the wide dynamic range of proteins in practical samples, revealing of low-abundance proteins is still a tough and on-working task. PTM-enriching could act not only as a targeted approach, but also as an efficient way to reduce sample complexity, so probably revealing supplementary information. Here, for the study of normal human liver N-glycoproteome, we have combined two enrichment methods (hydrazide chemistry and hydrophilic affinity) and two MS dissociation methods (collision-induced dissociation and electron-transfer dissociation), and obtained the largest human liver N-glycoproteome dataset, containing 915 N-glycoproteins and 1,786 N-glycosites. The dataset is not only a valuable supplement for normal human liver proteome (382 newly identified proteins), but also important basis of further researches in the field.(2) Outstanding specificity of hydrazide chemistry (usually above 90%) means the capability to reduce sample complexity and also reliable N-glycosite identification. However, due to its high specificity, usually glycoprotein identification can only rely on a limited number of de-glycopeptides. Those identified glycoproteins have low sequence coverage, and some are single-peptide-hit identification likely. A novel two-step protease digestion and glycopeptide capture approach has been developed. Through controllable release, separate identification and combined interpretation of non-glycopeptides (newly introduced LT-peptides) and traditional de-glycopeptides (DG-peptide), the approach could not only achieve routine N-glycosite identification, but also provide further proofs of N-glycosites and increase glycoprotein sequence coverage. The approach has been successfully applied to cell lysate. Without sacrificing enrichment specificity, glycoproteins got improved sequence coverage with increase even up to 350%(averagely 79.4%), and DG-peptide-revealed N-glycosites got further confirmation by related LT-peptides.Part 3. High-throughput N-glycopeptide Studies:two sections included (1) One-pipeline approach achieving glycoprotein identification and obtaining intact glycopeptide information; (2) Glycopeptide revealing and interpretation platform (GRIP).(1) Analysis of intact glycopeptides largely depends on glycosylation enrichment; traditionally, after peptide-level enrichment, protein identification and glycopeptides interpretation would be in two separated flow paths:one carries out de-glycosylation, the other keeps glycopeptides intact. A novel one-pipeline approach has been developed. Without de-glycosylation, this approach has been demonstrated to achieve glycoprotein identification and obtain intact glycosylation information after peptide-level enrichment. The proposed workflow has two enrichment steps plus two proteolytic processes:enriched glycoproteins were digested by Lys-C, and then enriched again and secondly digested by trypsin. In the resulting mixture, with a reasonable complexity, intact glycopeptides could be preserved and utilized informatively for glycosylation analysis, and non-glycopeptides for protein identification. In both standard protein mixture tests and real sample analysis, the resulting glycopeptides and non-glycopeptides were proved to play their expected roles, thus more confident protein glycosylation information was obtained.(2) High-throughput intact glycopeptides analysis is one of the most difficult tasks in glycoproteomics. Several reasons, e.g. high complexity of glycosylation, relatively low abundance and special physiochemical properties of glycopeptides, make the study on intact glycopeptides difficult. Enriching methods have not achieved a stable high specific separation for intact glycoprotein/peptides. Till now, there has not been a widely used routine for glycopeptide analysis, as that for N-glycosites. Bioinformatics tools targeting glycopeptides have been developed, but existing tools usually could not endure complex samples, or would provide too many results to choose. In our glycopeptide revealing and interpretation platform (GRIP), we introduced experiment-based de-glycopeptide and glycan datasets to form a sample-friendly glycopeptide database, and used novel algorithm designed for glycopeptides "sequence tag" searching and composition interpretation. GRIP has achieved high-throughput glycopeptide analysis for human serum:3,091 spectra (1% false positive rate), corresponding to 1,020 different glycopeptides were identified; micro-heterogeneity of glycosylation was also revealed in the result. Analysis of those glycopeptides by another MS fragmentation technique HCD (higher-energy collision-induced dissociation) proved the accuracy of aforementioned results.Part 4.β-elimination Methodology Based O-glycosylation Studies:comparative studies differentβ-elimination methods on practical sample for O-glycosite and O-glycan identification.Because there is neither consensus sequence for O-glycosylation, nor a universal O-glycosidase which could function the way as peptide-N-glycosidase F (PNGase F) do in de-N-glycosylation, recognition of O-glycosylation sites (O-glycosites) is still challenging. Recent investigations have worked on mild P-elimination/addition methods which could remove O-glycans while preserve peptide backbones, but most of those studies focused on samples with low complexity (e.g. synthetic peptides, purified proteins) rather than complex samples. Here, for the first time, we applied three differentβ-elimination/addition approaches, to explore the O-glycosite profile of human serum. Surprisingly quite different results were obtained from different approaches, though they were based on similar mechanism. Totally,157 O-glycosites from 96 proteins were revealed, only four of them were identified from two or more approaches; the number of detected modification on Ser was higher than that on Thr (61% vs.39%); glycosites identified from the same proteins showed close appearances. Meanwhile, reductiveβ-elimination of O-glycans from serum sample was also performed from both protein and peptide levels. MS analysis showed the former favored glycans with lower molecular weight, while the latter favored glycans in the higher mass range.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2011年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络