节点文献

细胞器蛋白质表达谱分析平台的建立及在人肝细胞核蛋白质表达谱构建中的应用

Establishment of the Platform for Organelle Protein Profiling with Data Mining and Application in Human Liver Nuclear Proteome Research

【作者】 郝运伟

【导师】 贺福初;

【作者基本信息】 中国人民解放军军事医学科学院 , 细胞生物学, 2008, 博士

【摘要】 细胞器在细胞生命活动中占有重要的地位,尽管细胞器的研究由来已久,但是其蛋白质的构成仍然还未完全清楚。借助蛋白质组研究的手段可以规模化的发掘存在于细胞器的蛋白质信息。通过鉴定蛋白质及数据发掘,我们期望更进一步的了解细胞器中所蕴含的生命信息。亚细胞蛋白质组的研究一直是蛋白质组研究的重要的分支,更是人类肝脏蛋白质组计划(Human Liver Proteome Project)的重要组成部分,因此在本研究中首先以C57小鼠作为模型对细胞器的分离及数据的发掘进行前期探索性研究,希望能够有效的应用于人类肝脏细胞器蛋白质组的研究当中。国际上对亚细胞器的研究往往是集中于某一个细胞器。但由于细胞器之间存在着天然的联系,造成分离过程中不可避免的交叉污染问题;另一方面,由于蛋白质组研究手段的高灵敏性,使得这些相对微量的污染蛋白质得以鉴定,从而在定性的角度上造成了细胞器蛋白质定位的误判现象。基于此,我们设计了质谱无标记定量的实验,以期对蛋白质的亚细胞定位问题作出判断。为了使实验结果具有可靠的一致性,我们从以下两方面做了工作,一是设计了从同一个肝脏匀浆液中同时分离得到质膜、线粒体、细胞核和细胞浆四个亚细胞组分的分离路线,这可以看成是将完整的细胞分为四个部分,这样就尽可能的避免了由于组分遗失而在定量实验中可能导致的误判。我们利用电镜及免疫印迹手段从形态学和纯度上对分离的亚细胞组分做了评价,结果表明我们分离的细胞器完全符合我们实验要求。其二,我们采用混合蛋白质的策略评价了蛋白质分离、酶解和质谱鉴定路线的定量可靠性和准确性,从而确定了一个良好的无标记定量实验路线和定量数据处理分析方法,并以此路线和方法对分离的细胞器组分进行分析。经过反转数据库的评价,将数据的假阳性控制在1%以下,以此标准,我们共鉴定到小鼠肝脏四个细胞器中的非冗余蛋白质3,189个。鉴定蛋白质的亚细胞定位依赖于质谱定量的结果,为了对亚细胞蛋白质定位策略做一个较为全面的分析,我们采用了一个逐步递进的数据判定方法。首先是直接根据校正后的质谱定量信息,采用聚类分析的算法,对3,189个蛋白质进行分类。其二,为了增加判读结果的可靠性,在聚类结果中选择具有2个以上谱图的蛋白质数据进行定位分析。第三,我们引入了机器学习方法一最邻近算法(KNN)来进行分类。基于上述三条标准,我们对小鼠肝脏2,740个蛋白质明确的进行了细胞器定位划分,并且给出了120个新蛋白质的亚细胞定位信息。这是目前小鼠肝脏中可以得到亚细胞定位划分的最大蛋白质组学研究数据,它将作为一个资源用于以小鼠为模型的其它研究。规模化的蛋白质表达谱研究在提供参考数据集的同时也对数据的发掘提出了新的要求。在这罩我们采用GO(Gene Ontology)分类系统和蛋白质相互作用数据来对我们的鉴定数据进行蛋白质功能的发掘。我们以GO中生物学过程和分子功能对总体数据功能特征进行注释,大量的鉴定蛋白质涉及营养物质代谢,酶活性和能量代谢的功能,这表征了肝脏作为机体代谢中心和“能量工厂”的生理特点。而细胞器鉴定蛋白质的GO超几何富集分析结果与细胞器特点相一致,对细胞器的特性及其生理功能给出了很好的诠释。因为蛋白质相互作用本身对蛋白质功能具有很强的提示作用,结合我们的亚细胞定位数据我们期望可以发掘新蛋白的潜在功能和已知蛋白质的可能的新功能,这也是我们研究的一个重点。遗憾的是,目前尚无大规模的小鼠蛋白质相互作用的数据,但鉴于不同物种间蛋白质的相互作用具有一定的保守性,所以我们采用同源比较的方法从其它物种已有的大规模蛋白质相互作用出发,构建了小鼠的蛋白质相互作用,共得到了3,757个蛋白质的10,274个相互作用。我们将鉴定的具有亚细胞定位信息的蛋白质分别投入网络中而形成四个子网络。对子网络的分析,我们采用网络拆分的方法,从相互作用的网络中预测复合体,并结合GO分析得到复合体的功能,最后从文献发掘的角度对预测到的复合体进行分析。我们从鉴定的亚细胞定位数据中寻找到25个复合体,包括了蛋白降解、RNA剪切、核糖体装配、信号传导和氨基酸代谢等功能。在复合体的解读中我们分别发现了26S蛋白酶体、mRNA拼接体、线粒体18S核糖体、ARP2/3蛋白复合体中的新蛋白成员,而且提示了线粒体18S核糖体可能参与代谢的新功能。并且在核糖体装配复合体中发现了Cirhin,而Cirhin是一个功能尚未明确的致病基因,此信息对该基因的致病机理给出了可能的参考方向。最后发现了一些功能还不明确的复合体,有待于实验的验证。在C57小鼠细胞器的研究基础上,我们拓展到成人肝脏的细胞器研究,构建了人肝脏细胞核的蛋白质表达谱。在定位的算法方面,我们进一步将仅基于单一因素的KNN算法拓展为采用贝叶斯数学模型综合信号肽、氨基酸组成等多方面信息的策略,并对肝脏细胞器表达谱进行分析,最终得到了2,052个细胞核蛋白。通过功能分析,我们系统地表述了鉴定蛋白质在细胞核中所蕴含的生物学意义,并且结合定位我们发现了许多参与信号转导、翻译起始、蛋白降解等进程的蛋白质细胞核定位,提示了细胞核内存在着更为复杂的生物学过程。此外,我们首次在蛋白质组水平上做了进化分析,第一次发现蛋白质进化与蛋白质的保守性呈正相关的现象。总之,通过亚细胞器分离和无标记定量技术体系的建立,为细胞器蛋白质组研究打下了良好的基础,并且利用机器学习的算法对鉴定蛋白质的细胞器定位有了科学的评价,进一步拓展了对蛋白质亚细胞定位的认识。此外,我们在功能分析中尝试了以蛋白质复合体模式挖掘数据的策略,得到了一批未知蛋白的功能提示。这些平台的建立为下一步人类肝脏细胞器蛋白质组研究做了相应的准备,而且这种细胞器分离及数据分析的策略对于其他组学研究而言也具有一定的参考意义。

【Abstract】 Cell organelles play important role in cellular process. Although these cell compartments have been studied for a long time, their compositions are still unclear to some degree. Proteomics provide a powerful tool to give a survey of the proteins in a cell or tissue. We could get more information about organelles from the protein compositions. Organelle proteomics is an important part of proteome research and is also a crucial part of Human Liver Proteome Project. To gain a suitable strategy for human liver proteome research, C57 mouse is chosen as model for sample preparation and data analysis.The subcellular proteomics is an important part of proteomics research. And the investigation of organelles always focuses on one compartment, which may loose the integrity information of cells. In addition, the separated cell organelles always contain cross-contaminant by other compartments because different organelles connect with each other in nature, which lead to the wrong subcellular classification of identified proteins. To address this issue, we designed an experiment to study the protein compostion and localization with proteomics and bioinformatics tools. We carried out the experiment in two ways to explore an accurate quantitation strategy and guarantee the comparability among organelles. Firstly, a subucelluar separation method was employed, which could separate plasma membranes, mitochondria, nuclei and cytoplasm from the same homogenate. And the western blot and electron microscope observation showed the satisfying purity and integrity of the organelles. Secondly, we evaluated accuracy of quantitaion method in the protein separation and identification by a mixed protein samples. According to the quantitation result, we gained a confident protein identification strategy and data processing method. At last, we identified 3,189 proteins, in which the false positive was controlled at 1%.The protein subcelluar classification was dependent on the quantitaion result. To give a comprehensive analysis of our strategy, a step by step method was introduced to evaluate the protein localization. At first, we used cluster algorithm to classify the data by the calibrated spectral count. Secondly, we found that the quantitation result was more accurate if the spectral count was no less than 2. Finally, k nearest neighbor algorithm was employed to give an evaluation of protein localization by quantitation result and golden standard. By the three criteria above, 2,740 proteins were localized by our strategy with 120 new proteins localization, which is the largest subcelluar protein localization data for mouse liver.The aim of proteome research is not only to provide a reference map, but also to give request to data mining to form new knowledge. Here we used Gene Ontology (GO) and protein interaction data to annotate the subcellular protein data. The distribution of GO terms in our data show that the primary metabolism and enzyme activity are the largest, which represent characteric of liver physiology. Because the protein interactions strongly suggest the function among proteins, we exploit this information to mine the potential function of these proteins. The mouse protein interactions were constructed from the model organism by ortholog comparison. And we obtained 10,274 protein interactions of 3,757 proteins. After this, the subcellular proteins were put in the network as seeds to get the sub-network. MCODE algorithm was used to analyze the organelle related sub-network and 25 protein complexes were found from the data. The function of these protein complexes involved in protein degradation, mRNA splicing, ribosome assembly and signal transduction etc. With literature annotation, we found some new members in the 26S proteasome, mRNA spliceosome, mitochondrial 18S ribosome, actin related 2/3 protein complexes and give a clue to the potential function of mitochondrial 18S ribosome in metabolism. Moreover, we found Cirhin in ribosome assembly and it is a disease gene whose function is not clear now. Our result implicated potential pathogenesis. At last, we found some unknown complexes in the data, which need further experiment confirmation.Based on the experience of C57 mouse study, the technology was implied in human liver samples and established the human liver nuclear proteome. 2,025 proteins were given the localization to nuclei with an improved algorithm that mixed KNN with other information by Bayes model. In the function analysis, the biological characterization was described systematically in nuclei, in which many proteins involve in signal transduction, translation initiation and protein degradation. And lots of new proteins were localized by our method and some other proteins had a new cellular localization, which expand the knowledge about the liver proteins. In addition, we analyzed protein evolution in proteome scale and found positive correlation between protein quantity and evolutionary conservation.In conclusion, we shed light on the organelle proteomics through subcellular isolation and label free quantity method. More importantly, a strict evaluation to protein localization is established with machine learning algorithm, which expands the knowledge of organellar proteins. In protein function analysis, we try a protein complexes strategy to study the unknown proteins and gain the useful information. This platform is a pre-trial for human liver organelle proteome, which is also worthy to other "omics" research.

节点文献中: