节点文献

蛋白质序列变异与疾病相关性及蛋白质相互作用数据库的构建

Construction of Protein Databases for Disease-related Mutations and Protein-centered Interactions

【作者】 奚洪

【导师】 李亦学; 张雪洪;

【作者基本信息】 上海交通大学 , 生物医学工程, 2010, 博士

【摘要】 人类遗传相关的疾病长期以来一直威胁着人们的健康与生命。随着遗传学与分子生物学的技术和研究进展,许多由于氨基酸序列的改变而导致的人类遗传相关疾病的基因变异已被鉴定。这方面大量的信息散布在科学文献和各类生物学数据库中。为帮助研究者方便使用这些数据并发现数据之间的有用的关联,本文构建了一个整合的与疾病相关的人类突变蛋白质序列集(dIMS),共收集了来自OMIM、PMD和SwissProt的34,891条与疾病相关的人类突变蛋白质序列,并从三个方面对dIMS中的数据进行了初步分析,包括按疾病信息对dIMS进行分类;分析氨基酸残基的突变谱以及疾病相关的点突变和功能域之间的关系。在dIMS的基础上,本文建立了一个系统的基于网络的数据库系统SysPIMP(the Systematic Platform for Identifying Mutated Protein; http://syspimp.starflr.info/),不仅用于浏览dIMS及其相关的各种信息,而且用于从质谱中鉴定与疾病相关的人类突变蛋白质。此外,由于在生物体内,几乎所有的蛋白质都是通过与其它各种物质(包括其它蛋白质在内)进行相互作用而行使其正常功能的,为了更好地理解由基因突变引起的蛋白质序列发生变化所导致的人类遗传类疾病的致病机理,分析与疾病相关的蛋白质所参与的相互作用网络是必要的。为此,本文建立了一个整合的以蛋白质为中心的相互作用数据库IPID(http://ipid.starflr.info/),收集了来自25个公共的相互作用数据库的2,065,735对与蛋白质相关的相互作用数据,包括五种不同类型的相互作用数据。经去冗余后,IPID共收集了560,442对非冗余的与蛋白质相关的相互作用,其中包括198,947对人类的非冗余的与蛋白质相关的相互作用。IPID中的InterX!Tandem用于鉴定质谱中的蛋白质并提供与所鉴定的蛋白质相关的存储于IPID的各种相互作用数据。在IPID的基础上,本文还对由《人类疾病网络》一文定义的22类疾病相关的以蛋白质为中心的相互作用网络进行了初步分析。SysPIMP和IPID这两个系统的建立,希望能够为蛋白质序列变异与人类遗传疾病的相关性研究和遗传相关疾病的诊断带来方便。

【Abstract】 Human genetic diseases have been a threat to human health and life for a long term. With the development of genetics and molecular biological techniques, many gene mutations which can cause some human genetic diseases by changing amino acid sequences have been identified. A disease-related integrated human mutated protein sequence dataset, called as dIMS, which collected 34,891 dIMS from OMIM, PMD and SwissProt, was constructed. The initial analysis for dIMS was conducted from three aspects, including the classification of dIMS according to disease information, amino acid mutational spectrum analysis, and the analysis of the relationship between disease-related point mutations and functional domains. Based on the dIMS, a web-based system, SysPIMP (the Systematic Platform for Identifying Mutated Protein; http://syspimp.starflr.info/) was constructed not only for browsing dIMSs, but also for identifying disease-related human mutated proteins from the mass spectrometry results. Almost all proteins conduct their own functions through interactions with all kinds of other molecules including proteins. For better understanding the mechanisms of human genetic diseases caused by gene mutations, the research for the interaction networks these disease-related proteins are involved in is necessary. For satisfying this, a web-based integrated protein-centered interaction database (IPID; http://ipid.starflr.info/), collecting 2,065,735 protein-related interactions from 25 public interaction databases, covering five different interaction types, was constructed. After removing redundancy, IPID collected 560,442 non-redundant protein-related interactions, including 198,947 human non-redundant protein-related interactions. InterX!Tandem implemented in IPID is used to identify proteins from mass spectrometry results and provide all kinds of interactions stored in IPID which are related to those identified proteins. On the basis of IPID, 22 disease-related human interaction networks determined by Human Disease Networks were investigated. The construction of these two systems, SysPIMP and IPID, will be helpful for further researches and diagnoses of human genetic diseases.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络