节点文献

蛋白质结构预测的现实求解方法

Practical Approach to Protein Structure Prediction

【作者】 吕志鹏

【导师】 黄文奇;

【作者基本信息】 华中科技大学 , 计算机软件与理论, 2007, 博士

【副题名】高效启发式优化算法

【摘要】 蛋白质结构预测问题是计算生物学领域的核心问题之一,对其求解是后基因时代蛋白质工程的一项重要任务。已经证明,即使按最简化的数学模型,所导出的问题仍然是NP难度的。因此,蛋白质结构预测问题的研究在当今国际学术界是一项具有挑战性的重大课题。求解NP难度问题的方法主要有三种—–完整算法、近似算法和启发式算法。完整算法虽然能保证给出最优解,但由于人们普遍相信P= NP,指数级的计算复杂度导致其在实际应用中很难求解较大规模的问题实例。近似算法能保证在最坏情况下所得解的精度与最优解之间的误差在一定的范围内,但其实际计算效率往往不能令人满意。另一种方法是启发式优化算法。启发式算法的主要思想来源于生物世界和社会现象,它往往可以在算法速度和精度之间达到一种很好的平衡,有可能在较短时间内求解大规模的问题实例,并达到令人满意的精度。拟物拟人算法是一种借助物理知识和人类社会经验来求解NP难度问题的启发式方法。对于蛋白质结构预测问题,当前的研究重点是设计求解该问题的高效启发式优化算法。研究了蛋白质结构预测问题的两个简化模型—–HP格点模型和AB非格点模型。HP格点模型中,PERM算法不够简洁,不便于理解。AB非格点模型中,没有非常贴近问题本质的高效求解算法。对于这两个模型,文献中算法的计算效率不够高。对于HP格点模型,PERM(Pruned-Enrichment Rosenbluth Method)算法是当今国际文献中最先进的求解算法。在介绍PERM算法的基础上,对其给出了一种拟人解释—–人口控制策略,使该算法变得好想,易于理解,对算法中的权重及预测值进行了改进,并对选择动作时不同情况下的权重计算公式进行了统一。综合这些策略得到了改进的PERM算法。在此基础上提出了进一步的拟人改进策略。根据拟人思想对权重预测公式进行了重新定义,拟人改进后的PERM算法在链生长过程中不仅考虑氨基酸的类型(H或P),同时考虑氨基酸在整个链中的位置。拟人改进的PERM算法的计算结果可概括为以下三点:第一,算法的计算速度要优于目前国际文献中最先进的求解算法—–nPERMis(new PERM importancesampling),计算速度是nPERMis的几倍到几十倍。第二,对一个链长为103的标准问题实例,拟人改进的PERM算法得到的最低能量为-55,该最低能量要优于nPERMis算法所得的最低能量-54。第三,对一个链长为46的标准问题实例,拟人改进的PERM算法首次得到了最低能量-35,该最低能量要优于文献中所报道的最低能量-34。对于AB非格点模型,找到了贴近问题本质的物理模型—–弹簧模型。在此基础上通过将原始约束优化问题转化为无约束优化问题,提出了求解基于AB非格点模型的蛋白质结构预测问题的拟物算法。拟物算法的思想基于所提出的物理模型。拟物算法及其计算结果可概括为以下三点:第一,算法提出的拟物思想很好地贴近了问题的本质。第二,以HP格点模型为基础生成初始解,算法所得解的精度要优于一种以PERM算法生成初始解的共轭梯度法所得解的精度。第三,以ELP(EnergyLandscape Paving)算法为基础生成初始解,对于绝大多数文献中的标准算例,拟物算法所得解的精度要优于国际文献中最先进的几个求解算法所得解的精度。以上研究成果表明:拟物拟人策略是求解蛋白质结构预测问题的一种有效途径。进一步工作将研究基于更加真实的数学模型的蛋白质结构预测问题的高效启发式求解算法,以期在不久的将来将其应用于蛋白质工程的实践中去。同时,沿着拟物拟人的途径,有望为其它NP难度问题设计出高效率的求解算法。

【Abstract】 Protein structure prediction is one of the central problems in the field of computationalbiology and one of the most demanding tasks of protein engineering in post-genome era.Meanwhile, the protein structure prediction problem remains to be NP-Hard even for itsmost simplified model. Therefore, it is a challenging task to study the problem of proteinstructure prediction.For NP-Hard problems, we are forced to go on one of three ways. One is accurate algo-rithms that are employed to produce an optimal solution in an enumerative way. However,under the widely believed conjecture that P=NP, the exponential complexity of accuratealgorithms have limited themselves only to theoretical analysis or small size applications.As an alternative to solving NP-Hard problems to optimality, a stream of research has con-centrated on polynomial-time approximate algorithms with performance guarantee, i.e., anupper bound on the ratio between the approximate solution quality and the optimal one.The third way is to design practically efficient heuristic or meta-heuristic algorithmsfor solving NP-Hard problems. Heuristic algorithms seek for high quality solutions at areasonable computational time, and they can also be considered as intelligent techniquesthat are based upon human intuition, which often comes out as a result of analogies withphysical world, human experience or biological phenomena. Quasi-physical and quasi-human algorithm is one of the most effective methods for solving NP-Hard problems, whichis inspired by the wisdom from physical phenomenon and human experience. For proteinstructure prediction problem, the main focus of current research is to design highly efficientheuristic and meta-heuristic optimization algorithms.Two of the simplified models for protein structure prediction, called HP lattice modeland AB off-lattice model, respectively, are studied. For HP lattice model, PERM algorithmis not succinct and can not be naturally explained, while for AB off-lattice model, no appro-priate heuristic algorithms have been reported. Besides, for both models, the computational performance of the previously proposed algorithms in literature still needs to be improved.For HP lattice model, PERM (Pruned-Enrichment Rosenbluth Method) is recognizedto be the most efficient algorithm in literature for solving the protein structure predictionproblem based on this model. A personification explanation of PERM is proposed basedupon introducing PERM, which makes PERM algorithm naturally explained. A new versionof PERM, population control algorithm, with two main improvements is presented: one isthat it redefines the weight and the predicted value in PERM, and the other is that it is ableto unify the calculation of weight when choosing possible branches. Further personificationimprovement strategy is proposed, that is, predicted value of weight is redefined using thepersonification ideas. The further improved PERM algorithm not only considers the typesof amino acids (H or P), but also takes into account the position of the current amino acid inthe protein chain during the chain growth procedure.The merits of the two improved PERM algorithms can be generalized as the followingthree points: Firstly, the improved PERM is more efficient than the previous version nPER-Mis (new PERM importance sampling) and is generally several to hundreds times fasterthan nPERMis. Secondly, for a standard benchmark instance with length equal to 103, itcan find lower energy (-55) than that nPERMis can find (-54). Finally, for a standard bench-mark instance with length equal to 46, it can find lower energy (-35) than the previously bestresult (-34) in literature.For AB off-lattice model, an appropriate physical model—–spring model—–is pro-posed to simulate the original problem, based on which the original constraint optimizationproblem can be converted into an unconstraint one. Then a corresponding quasi-physical al-gorithm is provided to solve the protein structure prediction problem based on AB off-latticemodel, whose main ideas are based on the proposed physical model.The merits of the proposed quasi-physical algorithm and its computational results canbe concluded as the following three points: First of all, the proposed quasi-physical ideais perfectly appropriate for the original problem. Secondly, based on the initial solutionsgenerated on SC-HP lattice model, the quasi-physical algorithm can produce higher qualitysolutions than PERM+ method which utilizes PERM to generate initial solutions and sub- sequently a conjugate gradient method is employed to improve them. Finally, based on theinitial solutions generated by ELP (Energy Landscape Paving) algorithm, the quasi-physicalalgorithm is superior to a number of famous algorithms in literature for most of the standardbenchmark instances.Above results indicate that quasi-physical and quasi-human strategy is one of the mostefficient and effective approaches for solving protein structure prediction problem. Furtherresearch work will be focused on the high-efficient heuristic algorithms for more realisticprotein structure prediction models, so as to apply them to the practical protein engineeringin the near future. Meanwhile, following the path of quasi-physical quasi-human strategies,it is promising to design high-efficient heuristic algorithms for solving other NP-Hardproblems.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络