节点文献

从头预测蛋白质结构元启发方法研究

Research on Metaheuristic Approach to De Novo Prediction of Protein Structure

【作者】 黄旭

【导师】 钱培德; 吕强;

【作者基本信息】 苏州大学 , 计算机应用技术, 2011, 博士

【摘要】 蛋白质因其具有特定结构而成为具体生命功能的执行者。蛋白质结构预测在基因数据高速膨胀,而结构解析成本高、效率低的情况下显得尤为重要。从头预测蛋白质结构不依赖于已知的结构模板,是蛋白质结构预测领域中一项技术难度高、现实意义深远的研究内容。从计算机的角度来说,蛋白质结构预测本质上是一个组合优化问题,而该项组合优化问题所面临的前所未有的搜索空间与纷繁庞杂的约束机制,是计算机领域的一大挑战。本文在综述蛋白质结构从头预测以及并行元启发相关内容的基础上,着重研究了搜索空间、搜索策略、聚类方案三个方面的内容。主要研究内容包括:1.结构预测搜索空间研究。研究了骨架预测的片段结构及生成方式,以及侧链旋转异构体的结构及生成方式。在此基础上,针对侧链旋转异构体的生成,提出一种基于动态贝叶斯网络的四层模型。该模型主要有以下两个特点:一是考虑到骨架信息以及侧链4个扭角之间的相互关联及依赖,体现出明确的推理层次,更符合蛋白质分子的生物特性;二是在每一个层次上减少了未知变量个数,降低了模型复杂度,有利于在训练数据集合不变的情况下,缓解数据稀疏现象,提高模型精度。实验表明,该四层模型获得了高质量的结果。此外,还提出一种以极端构象与随机构象评价旋转异构体库的方法,通过在CASP9的FM类数据集上进行实验,验证了方法有效性。2.并行元启发搜索策略研究。以ACO为例,深入剖析了元启发工作原理,提出以任务分解与经验反馈为基本特点的并行元启发策略。针对从头预测蛋白质结构优化目标难以准确量化、解的构造复杂等问题,提出一种并行元启发搜索框架,融合了不同的能量函数及搜索策略。同时,结合GPCR预测详细设计了任务分配策略。基于ACO机制设计了蛋白质骨架及侧链预测算法。在骨架预测中,详细设计并实现了蚁群内搜索方案、解的构造方法、局部搜索策略以及并行分配机制。最后在Science上一篇文章所采用的16个小蛋白质数据集以及CASP8的FM类数据集上进行了实验,实验结果表明本文的方法具有很强的竞争力。3.蛋白质结构聚类研究。主要涉及两个方面的研究:一是提出一种用于蛋白质结构聚类的聚类中心选择算法。该方法在深入研究目前常用的蛋白质结构聚类算法――QT算法与AP算法的基础上,着眼于利用统计信息来提高发现最优构象的能力,克服了原有算法受限于具体参数的弊端。二是提出利用能量信息优化结构相似性矩阵的分布特性,提高相似性矩阵对蛋白质天然状态的表现能力,为聚类算法的工作奠定良好基础。最后在两个权威数据集上进行了实验,实验结果表明本文的方法能够针对特定数据集合有效提高聚类性能,从而选择到更加接近天然构象的候选结构。本文的创新点主要表现在:提出了用于生成侧链旋转异构体库的四层推理模型,该模型充分考虑到骨架与侧链之间的相互关联及依赖关系,并在降低模型复杂度、缓解数据稀疏方面做了合理设计;提出适合蛋白质从头预测的并行元启发方案,在骨架预测中取得了明显效果;提出用于蛋白质结构聚类的聚类中心选择算法以及相似性分布优化方案,提高了搜寻最优构象的准确率。实验表明,这些研究对蛋白质结构从头预测起到了积极的推进作用,对后续相关研究有重要参考价值。

【Abstract】 Proteins with certain structure are executants of the material life function. Pre-diction of protein structure is quite significant in the context of Gene data explosionbut structure parsing with high cost and low e?ciency. De novo prediction of pro-tein structure with no structure template is a significant research content with hightechnical di?culty and great practical significance.Prediction of protein structure is essentially a combination optimization problemin computer view. And this problem with an enormous search space and complicatedconstraining mechanism is a major challenge in computer field. We summarized denovo prediction of protein structure and parallel metaheuristics in this dissertation.And the main content of this dissertation includes search space, search scheme, andclustering scheme.1. The research on search space of structure prediction. The fragment structureand its building method for both backbone and side chains are summarized in this dis-sertation. A four-level model for building rotamer library based on Dynamic BayesianNetworks is proposed. The relation of backbone with four side chain torsion anglesis considered in this model, so it shows an obvious ratiocination hierarchy, and thismodel is in accord with the biology characteristic of protein molecule. It holds onlyone unknown parameter in every level, so the complexity of this model is reduced,and the problem caused by parse data is solved to a certain extent for the same scaleof training data. Experiment results show that this model obtain models with highquality. Moreover, assessment of rotamer library with ultra conformation and randomconformation is proposed. Experiment on CASP9 FM targets shows that this methodis effective.2. The research on parallel metaheuristics. A parallel metaheuristic strategywhich main characteristics are task parsing and experience feedback is proposed basedon metaheuristics such as ACO. And a parallel metaheuristic search frame with fusingdi?erent energy functions or search strategies is proposed for solving the problems of optimization target is hard to quantify and solution structure is extraordinary complex.And the task distribute strategy is designed for prediction of GPCR. Further more,algorithms of prediction of backbone and side chain are designed. The search schemein one ant colony, the solution construction, the local search, and the parallel distributestrategy in prediction of backbone are implemented. Experiments on data sets of 16small proteins provided by a paper on Science and FM targets in CASP8 show thatthe method proposed in our dissertation had got a considerable e?ect.3. The research on protein structures clustering. It includes two aspects. First,an exemplar selection algorithm for clustering protein structures is proposed basedon the widely-used quality threshold and a?nity propagation algorithms in proteinstructure prediction. The ability to find the best conformation is enhanced based onstatistical information, and the algorithm does not depend on experience parameter.Second, a scheme of optimizing the similarity matrix based on energy is proposed. Itcan form good basis of clustering. Experiments on authoritative data sets show thatthe methods proposed in our dissertation can enhance the performance of clustering,and find the closer decoys to native structure.The major contribution of this dissertation includes: the proposal of the four-levelmodel for building rotamer library, the relation of backbone with four torsion anglesis considered, and the problem caused by parse data is solved to a certain extent; theproposal of the parallel metaheuristic search strategy fitting for de novo prediction ofprotein structure, it has got a considerable e?ect in backbone prediction; the proposalof an exemplar selection algorithm for clustering protein structures and the schemefor optimization of similarity distribution, they can improve the correctness of optimalstructure selection. Experiments show that this work will exert positive e?ects on denovo prediction of protein structure, and exhibits a great reference value to the futurecorrelative research.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2012年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络