节点文献

蛋白质相互作用网络的数值研究

Numerical Researches on Protein-Protein Interactions Network

【作者】 谢江

【导师】 张武;

【作者基本信息】 上海大学 , 计算机应用技术, 2008, 博士

【摘要】 随着人类基因组测序工作的初步完成,生物信息学这一新兴交叉学科得以孕育和发展,并成为生物学、计算机科学及应用数学等多种学科研究的热点和重大前沿领域之一,也是21世纪自然科学的核心领域之一。“后基因组时代”面临的更多挑战来自于对蛋白质组学的研究,这是因为蛋白质组学能更为深入地揭示生命现象的本质,从蛋白的相互作用关系与功能上回答生命过程的规律,而目前人们对它还缺乏有效的研究手段。本论文将计算机科学应用于蛋白质组学,以生物信息学的方法研究蛋白质的相互作用网络。本论文以蛋白质相互作用网络为对象,在基于Web Service的生物信息学问题求解平台PSE-BioServer上,以跨物种网络搜索的方法,研究了酵母、果蝇和人类等不同物种间蛋白质相互作用网络的相似性,以认知蛋白质相互作用的意义,预测蛋白质的功能和相互作用,获取生物进化过程中的保守信息,并将相似网络中反映的信息作为进一步研究和治疗疾病的参考。本论文的创新性工作主要有四个方面:(一)在充分了解PathBLAST、MNAligner等目前已有的生物分子网络比对相关算法的基础上,本文针对蛋白质相互作用网络的特点,提出了直接邻居优先算法(Immediate Network Neighbors Preference Method,INPM),实现了蛋白质相互作用网络的跨物种搜索。该算法强调蛋白质相互作用网络的生物意义,降低了因原始信息缺失带来的误差。实验结果表明由该算法搜索到的网络比由NBM、PathBLAST及MNAligner等同类算法得到的网络具有和目标子网更高的相似性,而且随着目标子网的规模的增大,INPM算法的计算速度普遍高于同类算法。(二)为了满足作为复杂网络的生物分子网络中大规模数据和多物种网络的计算要求,本论文研究INPM算法的并行处理策略,在工作站机群上实现了相应的并行算法,从而解除了对目标子网的网络规模的限制。经测试,该并行算法具有良好的加速比和可扩展性。通过INPM算法对酵母和果蝇的蛋白质相互作用网络的研究,论文提出了19条保守的蛋白质相互作用,预测了5条还未被收录的蛋白质相互作用。并根据基因本体论从生物信息学的角度预测了15个蛋白质的新功能。(三)不同于以往在基因水平的对单个分子的研究,本论文设计实验从蛋白质组学的水平,用数值方法研究果蝇帕金森病模型生物实验的数据,分析果蝇帕金森病相关的蛋白质相互作用网络,以探讨人类帕金森病发病的分子机制,这对帕金森病的研究是个全新的思路。论文讨论了果蝇和人类与帕金森病相关的蛋白质相互作用网络中的主要分区,部分分区的功能验证了已有文献中对帕金森病发病诱因的分析;预测了可能和帕金森病有密切关系的新蛋白CG2233的功能;列出了21个与α-synuclein及差异表达蛋白有直接相互作用的蛋白质,为人类帕金森病药物设计靶点的筛选提供了参考。(四)本文初步构建了基于Web Service的面向生物信息学的问题求解平台PSE-BioServer。PSE-BioServer上集成了蛋白质相互作用网络搜索工具,为实现资源共享、协同工作,并提供易用的高性能计算环境打下基础。该平台的开发将有效地解决人们用传统的方法研究生物信息学的时候遇到的问题。

【Abstract】 Along with the primary completion of human genome sequence analysis, bioinformat-ics has emerged and developed as a rising interdiscipline, and has become the hot point and leading research area of multidisciplinary research, such as biology, computer science and applied mathematics. It is also now one of the kernel scientific research areas in 21st century. The grand challenge that people are facing in "Post-genome Era" is proteomics, as completed proteome can further reveal the essence of life phenomenon, explore the rules of life procedure based on interactions and functions of proteins. However, effective approaches in this field are not enough so far. Computer science is applied to proteomics, and protein-protein interaction network (PIN) is investigated by using bioinformatics methods in this thesis.With the PSE-BioServer, a Web Service-based Problem Solving Environment (PSE) for Bioinformatics, the similarities of PINs among different species like Yeast, Drosophila and Human are investigated by using across-species network search methods. These similarities can be used to understand the meaning of Protein-Protein Interactions (PPI), predict the functions and interactions of target proteins, and access to the reserved information during species evolution. The information obtained from network similarities can be retained as references for future research and diagnosis.The innovative results in the four aspects are described below.1. Based on thorough studies of known biomolecular network alignment algorithms, such as PathBLAST and MNAligner, Immediate Network Neighbors Preference Method (INPM) is proposed. The INPM is based on the characteristics of PINs and cross-species search of them is implemented. The INPM emphasizes on biological significance of PINs and reduces errors resulting from the lack of original information. Networks found with the INPM have much higher similarities with target networks, compared to those found by NBM, PathBLAST or MNAligner. Moreover, the computing speed of the INPM is faster than other methods along with the augment of target network.2. In order to meet the computing requirements for mass and multi-species data in biomolecular complex network, the parallel algorithm of the INPM is developed, and implemented on cluster of workstation. So the limitation on the size of the target network has been relieved. It is proved that this parallel algorithm has good speedup and scalability. PINs of Yeast and Drosophila are investigated by using the INPM method. 19 reserved PPIs are detected and 5 unfiled PPIs are predicted. Based on Gene Ontology, new functions of 15 proteins are predicted from bioinformatics aspect as well.3. Different from previous studies on single molecular at gene level, we design experiment from proteomics level to study biological experiment data obtained from Drosophila Parkinson Disease (PD) model, and analyze Drosophila PIN related to PD by numerical methods. The pathogenesis at molecular level for human PD is explored by the experiment designed. This is a novel research idea for PD. Known analysis on inducement factors of PD is verified by discussing major subareas in Human and Drosophila PIN related to PD. Furthermore, the function of the new protein CG2233, which may be closely related to PD, is predicted, and 21 proteins that has direct interaction with differentially expressed proteins and alpha-synuclein are listed, which provide reference for filtering PD drug targets.4. Based on Web Service, a Bioinformatics-oriented PSE-BioServer is initially constructed. The PSE-BioServer with PIN search tools integrated will be the base of sharing resources, cooperating work, and offering easy-to-use high performance computing environment. With the PSE-BioServer, researchers in Bioinformatics will effectively solve the problems encountered by traditional methods.

  • 【网络出版投稿人】 上海大学
  • 【网络出版年期】2009年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络