节点文献

生物信息学应用程序新方法

Novel Methods for Bioinformatics Applications

【作者】 刘婷

【导师】 吴朝晖; Ruhong Zhou;

【作者基本信息】 浙江大学 , 计算机应用, 2006, 硕士

【副题名】力场参数优化和结构域分析

【摘要】 对蛋白质折叠的计算机模拟来说,一个准确的溶剂模型是至关重要的。外部水溶剂模型提供了水分子的原子化表示,然而这些模型计算成本很高。与外部水溶剂模型相比较而言,内部水溶剂模型(例如Poisson-Boltzmann,PB)将水溶剂看作一个围绕着溶质的连续媒介,提供了更快捷的计算速度,速度的增加不仅仅是因为自由度的降低,同时也因为不再需要对移动着的溶质附近的水分子进行采样。快捷的速度也是内部水溶剂模型非常受欢迎的重要原因之一。 由于内部水溶剂模型大多使用了某些经验参数,例如,所谓的非极性腔(non-polarcavity,溶剂自由能的非极性疏水部分)通常是基于溶剂可及面积来计算的。参数的取值对一些属性(例如溶剂自由能和折叠自由能)对内部水溶剂模型的最终准确性都非常重要。 在本文第二部分中,我们提出了一个综合方法,即SD/GA来进行多维空间内的参数优化,它将局部优化方法(步长加速法)速度较快的优点与全局优化方法(遗传算法)可以摆脱局部陷阱的优点相结合。之后SD/GA方法被用来优化PB模型中的non-polar cavity term参数,结果显示,利用SD/GA优化过的PB模型不仅在计算200个小有机分子的溶剂自由能时提高了准确率,更大大改进了B-hairpin折叠的free energy landscape。该B-hairpin在以前的研究中利用没有优化过的PB模型,表现出扭曲的自由能landscape。SD/GA方法还可以直接用于其它多维参数空间的优化工作。 结构域在蛋白质的各项研究中都起到了重要的核心作用。它是介于蛋白质二级结构和二级结构之间的一个结构层次,通常是指蛋白质中一段能够独立折叠成稳定空间构像的多肽序列,包含了大量的遗传信息和特定的分子功能,被认为是蛋白质结构,进化,折叠和功能的基本单位。 结构域的识别是蛋白质研究中非常重要也是极具挑战性的课题,在本文第三部分中我们采用了SVM方法仅利用序列所包含的信息来预测结构域位置的划分。SVM对不同的氨基酸属性及它们的不同组合进行了尝试性训练,这些属性包括了Position Index,Linker Index,Secondary Structure,Solvent Acc,Entropy和Hydrophobicity。 在对筛选自SCOP和CATH数据库得到的238个two-domain数据集合上,使用Position Index,Secondary Structure和Solvent Acc做cross-validation,SVM可以达到65%的准确率。

【Abstract】 An accurate salvation model is essential for computer modeling of protein folding and other biomolecular self-assembly processes. Compared to explicit solvent models, implicit solvent models, such as the Poisson-Boltzmann (PB) solver, offer a much faster speed, the most compelling reason for the popularity of these implicit solvent models.Since these implicit solvent models typically use parameters, such as atomic radii and the solvent accessible surface areas, in their calculations, an optimal fit of these parameters is crucial in the final accuracy in salvation free energy, folding free energy, and other properties.In the first half of this paper, we proposed a combined approach, namely SD/GA, which takes the advantages of both local optimization with the steepest descent (SD), and global optimization with the genetic algorithm (GA), for parameters optimization in multi-dimensional space. The SD/GA method is then applied to the optimization of solvation parameters in the non-polar cavity term of the PB model. The results show that the newly optimized parameters from SD/GA not only increase the accuracy in the solvation free energies for ~200 organic molecules, but also significantly improve the free energy landscape of a β-hairpin folding.The current SD/GA method can be readily applied to other multi-dimensional parameter space optimization as well.Protein domain plays an important role in protein science fields. Domain is considered as the fundamental unit of protein structure, folding, evolution, and function. It can fold independently or semi-independently into a stable and compact structure and exhibits a rich evolutionary history and a specialized molecular function. A protein may be comprised of a single or several domains, which are not necessarily contiguous.Identification of protein domain boundary is very important in protein study, and it is still one of the most challenging problems remaining in protein science fields. A large number of methods have been developed to detect the domain boundary or domain linker.In the second half of this paper we offer a new approach using SVM to predict protein boundary from sequence information alone. SVM tried several different descriptors of amino acids including Position Index, Linker Index, Secondary Structure, Solvent Acc, Entropy andHydrophobicity and some of their combinations. Training on a dataset of 238 two-domain proteins from SCOP and CATH, SVM achieves 65% 10-fold cross-validation accuracy using descriptors Position Index, Secondary Structure and Solvent Ace.We compared SVM result with other’s and this result is much better than many existed methods. At the same time , SVM is much more stable and fast as a prediction machine.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2006年 09期
  • 【分类号】TP311.11
  • 【下载频次】198
节点文献中: 

本文链接的文献网络图示:

本文的引文网络