节点文献

蛋白质结构预测与结构比对方法的研究

Research on Protein Structure Prediction and Structure Alignment

【作者】 段谟杰

【导师】 周艳红;

【作者基本信息】 华中科技大学 , 生物信息技术, 2009, 博士

【摘要】 后基因组时代最主要的研究任务之一就是阐明蛋白质的功能。蛋白质功能可以帮助人们理解复杂的生命现象。但是在许多情况下,不仅要了解蛋白质所发挥的作用,更需要理解为什么蛋白质会具有这种功能,这要求人们深入地研究蛋白质结构。然而,受到蛋白质结构和功能获取实验技术的限制,大量已知蛋白质的结构和功能仍是未知的。生物信息学的快速发展为解决这一问题提供了非常有效的途径。基于此,本文通过对特征进行发现和挖掘,研究了蛋白质结构和功能的预测及分析领域中几个相关问题。主要工作如下:(1)研制了一个二级结构预测工具。通过分析蛋白质二级结构端点位置附近氨基酸分布特征,发现这些位置上的氨基酸分布具明显的特异性。在此基础上,结合其它特征,构建了对二级结构整个片段进行整体预测的工具E-SSpred。利用标准测试数据集进行的测试结果表明,E-SSpred二级结构预测精度要优于同类软件,特别是对二级结构端点的预测准确度有大幅度的提高。(2)提出了一种考虑模板疏水环境的能量计算函数,并在此基础上开发了折叠识别预测系统。通过分析蛋白质结构中疏水环境对残基间成对相互作用能的影响,发现在不同疏水环境中残基相互作用能存在的差异。基于此,改进了折叠识别方法中的能量计算函数,并进一步将其能量函数应用于折叠识别方法之中。测试的结果表明,考虑疏水环境的影响可有效提高折叠识别的精度。(3)研制了基于二级结构元件的结构比对方法。通过分析二级结构元件特性及残基比对算法,针对于目前基于二级结构的结构比对方法在发现相关残基方面的不足,本研究在计算二级结构元件相似度时考虑元件的长度,并改进了残基对齐算法,基于此开发出了蛋白质结构比对工具3D-Sali。与同类软件的对比测试结果表明,3D-Sali具有较好的同源蛋白辨识能力,同时也可以很好的发现比对蛋白质间对应的残基。(4)分析了决定氨基酸替换对蛋白质功能影响的特征,并应用发现的特征进行预测。通过分析发现,功能位点及其相关位点上发生替换影响功能的可能性要远高于其它位点,而当前人们广泛应用的如进化信息等则不能反映这个现象。针对于此,功能注释数据库及位点相关性分析被用于得出功能位点及其相关位点信息,在此基础上进一步开发出氨基酸突变影响功能的预测方法。比较对比测试结果表明,这种方法可有效提高预测的精度。

【Abstract】 One of most important tasks in the post-genome era is inferring function of proteins. Protein function can help us to understand the elusive life system. However, in many cases, illustrating why the protein has such function is more important than the function itself. To deal with this problem, the protein structure should be known. Restricted by the experimental techniques, though many proteins have been sequenced, most of them have not been assigned functional annotations or structures. Bioinformatics, which has been developing rapidly, provides an efficient way to resolve this problem. In the view of this, it’s necessary to study the issues related to protein structure, function prediction and analysis using computational methods by characters finding.In this research, a protein secondary structure prediction tool was developed. The position-specific residue preferences around the protein secondary structures’ ends were analyzed, and the results showed that there are residues distribution specificity around these sites. Based on this new feature and other features, E-SSpred, a protein secondary structure prediction tool which predicts the secondary structure fragments as a whole, was proposed. E-SSpred was evaluated on standard test datasets and compared with other tools, and the results indicated that E-SSpred can have better performance.By using a novel energy function, a fold recognition method was proposed. The diversity of residue-residue pair-wise interaction in different hydrophobic environment was found. Based on this, a new energy function was proposed and used in fold recognition. The new energy function was tested and compared with common energy function, and the results imply that considering the hydrophobic environment can improve the accuracy of fold recognition.A structure comparison method based on secondary structure elements was proposed. Aimed to resolve the deficiency of recent methods in matching the related residues in query protein and target protein, we improved the secondary structure similarity scoring function and the residue-residue alignment algorithm, and further developed a structure alignment tool, 3D-Sali. The test results indicated that 3D-Sali has good performance on both detecting homology proteins and finding corresponding residues. In the last part, the key factors to decide the amino acid substitution in effecting protein function were analyzed, and were used to improve the prediction. The function sites and their related sites in proteins are found to be more sensitive to impact on the protein function when the substitutions happen on these sites. However, recent widely used features, such as evolution information, cannot show this characteristic. In order to solve this problem, the function annotation database and correlation mutation analysis were used to find the function sites and their related sites, and a method using a novel feature considering these sites was proposed to predict the effect of amino acid substitution to protein function. The test results indicated that this method could improve prediction accuracy effectively.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络