节点文献

求解蛋白质折叠问题的拟物拟人算法

Quasi-Physical Quasi-Human Algorithm for Protein Folding

【作者】 陈矛

【导师】 黄文奇;

【作者基本信息】 华中科技大学 , 计算机软件与理论, 2007, 博士

【摘要】 蛋白质所具有的生物学功能取决于蛋白质的空间折叠结构,研究蛋白质的空间结构在生物学领域具有重要意义。Anfinsen等人的研究工作表明,给定蛋白质一级结构序列,利用理论计算对蛋白质结构进行预测是一个可行的方法。由于蛋白质折叠问题的复杂度太高,理论界提出了一些简化模型。其中,Dill等提出的HP模型和Stillinger提出的AB模型最受欢迎,研究也最为广泛。基于HP模型和AB模型的蛋白质折叠问题都已被证明是NP完全问题,这意味着不存在既完整严格又不是太慢的求解算法。为了满足实际需要,人们于是着手研究非绝对完整但是是快速实用的启发式算法。沿着拟物拟人的工作途径,分别对基于HP模型和AB模型的蛋白质折叠问题的高效求解进行了研究。为了减小搜索空间,给出了一种带随机策略的剪枝算法来求解基于二维HP模型的蛋白质折叠问题。在算法中,保留那些前景好的结点,并允许它以一定概率向下分支,而那些前景不太好的结点则以一定概率被删除,其下相连的分支也被“剪掉”。这样减小了搜索空间,使得相应的搜索时间也大大降低。基于裁剪复制策略的PERM (pruned-enriched Rosenbluth method)算法是一种链生长算法,它通过制定一定的评判准则,让有前途的个体得以繁衍,而不具备发展潜力的个体停止繁殖,从而减少了搜索树的分支数。通过分析PERM算法,指出了影响PERM算法效率的关键,并从拟人思想的角度对PERM算法做出了进一步的分析:PERM算法实际上可以理解为一种人口控制策略,通过对社会中的个体给出评价,然后给出个体的生育指标,从而实现“优生优育”。在这样一个生动形象的背景的指导下,提出了更为合理和有效的评判准则,从而得到了一种改进的PERM算法。计算结果表明改进的PERM算法在求解基于HP模型的蛋白质折叠问题时表现出非常高的效率。通过拟物思想,引入引力势能和斥力势能,建立了一个启发式的引导函数,把基于三维HP模型的蛋白质折叠问题由一个约束优化问题转化为无约束优化问题,然后给出了一个基于局部搜索策略的启发式算法。为基于三维AB模型的蛋白质折叠问题找到了一个贴切的物理模型,该物理模型是一个力学系统。将氨基酸单体看作弹性小球,想象在蛋白质链上相邻两球球心之间连接着一根长度为1的弹簧。这样,系统能量中除了Lennard-Jones势能、弯曲势能和扭转势能外,还有弹簧势能。弹簧势能相当于一个罚函数,它是“松弛”约束条件的关键,把基于三维AB模型的蛋白质折叠问题由一个约束优化问题转化为无约束的优化问题。对此无约束优化问题,给出了一个梯度算法,并设计了一个产生初始构形的启发式策略。在用梯度法求解的过程中,计算很容易落入局部极小值的陷阱。为了跳出局部极小值陷阱,让计算走向前景更好的区域中去,在拟物算法的基础上,分别与模拟退火算法和禁忌搜索算法相结合,得到了效率更高的、具有全局优化能力的混合算法。尤其值得指出的是,以ELP (energy landscape paving minimizer)算法得到的结果构形作为拟物算法的初始构形,对于文献中的绝大多数算例,拟人算法都找到了当今国际学术界最好的构形,这种构形比文献中报道的构形具有更低的势能。以上研究工作表明,通过对物理世界中物质运动的演化规律和人类的社会经验进行抽象和形式化,可以启发人们为蛋白质折叠问题设计出高效的求解算法。在以后的研究工作中,我们将沿着拟物拟人的思路继续为基于其它模型的蛋白质折叠问题和其它NP完全问题寻求高效的求解算法。

【Abstract】 Since the proteins’biological properties rest on their dimensional folding structures, the study of the proteins’structure is of great significance in theoretical structural biology. According to the research of Anfinsen et al, predicting the native structure of a protein from its amino acid sequence by using theoretical computing method is a feasible approach. The theoretical science community has introduced and examined several highly simplified models since the protein folding problem is too difficult to be approached with fully realistic potentials. Among the simplified models, HP model of Dill and AB model of Stillinger are the most popular ones and have been studied extensively. Even in these highly simplified models, it is not easy to predict the native state for the protein folding problem, which has been recognized to be NP-complete. As a NP hard problem, it means that there does not exist an algorithm that is complete, rigorous and efficient. Consequently, people hope to obtain a heuristic algorithm for solving NP hard problems that is not absolutely complete, but is of high speed, high reliability and high efficiency. Through the so-called quasi-physical and quasi-human strategy, we proposed some efficient and effective algorithms for solving the protein folding problems based on HP model and AB model, respectively.In order to reduce the search space, a pruning algorithm with random strategy was proposed to solve the two-dimensional protein folding problem based on HP model. In this algorithm, only the promising nodes are kept for further branching with certain probability, while those nodes not promising enough are pruned with certain probability. With this pruning algorithm, the search space is reduced and the corresponding search time is decreased accordingly.Based on pruning and enrichment strategy, PERM (pruned-enriched Rosenbluth method) is a kind of chain growth algorithm. By formulating some certain evaluation criteria, PERM enriches those promising branches and prunes those with poor quality so that the search space is reduced. Via analyzing PERM, the factors that essentially affect the efficiency of PERM were indicated, and then some further analyses from the personification explanation were made. PERM can be viewed as a population control strategy, by means of which, the quality of an individual is evaluated, and then a guideline on procreation of the individual can be obtained, thus realizing the aim of“go with the winners”. With such a vivid background, a much more reasonable rule for evaluation was generated, and an improved version of PERM was proposed. Computational results indicate that the improved PERM performs efficiently with benchmarks of protein folding problem.Based on the quasi-physical idea, a new mathematical model including attraction potential and exclusion potential is constructed so that the three dimensional protein folding problem based on HP model is converted from a nonlinear constraint-satisfied problem to an unconstrained one. A heuristic algorithm based on local search strategy was proposed for solving the problem.A perfect physical model, that is, a mechanical system, is constructed for the three-dimensional AB mode-based protein folding problem. In this system, an amino acid monomer is regarded as a smooth elastic ball. Assume that the centers of any two successive balls along the chain are linked by springs with natural length to be 1. Therefore, apart from the energy contributions from Lennard-Jones, bond angles and torsion angles, the elastic potential energy of springs is also a part of the total potential of the configuration. The elastic energy of the springs acts as a penalty function of the degree of departure of a configuration from a legal one, and it is the key to relax the constraints. Accordingly, the protein folding is converted from a constraint optimization problem into an unconstrained one, which could be solved by the gradient method. A heuristic strategy was proposed to generate promising initial configurations.In the course of solution using the gradient method, it is often possible for the calculation to fall into the trap of local minimum. To jump out of the trap of local minimum and guide the search to the points with better prospects, we combined our quasi-physical algorithms with other methods with global-search ability, and then obtained a simulated annealing method and a Tabu search method, which are more efficient than the quasi-physical algorithm. It is noteworthy that based on the initial configurations obtained by ELP (energy landscape paving minimizer), most of our results for the benchmark instances are better than the best values previously reported in literature.The research in this dissertation indicates that by abstracting and formalizing the evolution rule of the motion of matter in the physical world and the humanity’s social experience, we could design highly efficient algorithm for solving the protein folding problem. Through this working path, we are trying to seek for high-efficent algorithms for other model-based protein folding problem and other NP-complete problems in our future study.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络