节点文献

含伪结的RNA二级结构预测算法的研究

Research on Prediction Algorithm of RNA Secondary Structure Including Pseudoknots

【作者】 杨金伟

【导师】 骆志刚;

【作者基本信息】 国防科学技术大学 , 软件工程, 2007, 硕士

【摘要】 RNA二级结构预测是近年来RNA研究的热点问题,研究者提出了多种预测算法,并取得了丰富的成果。但对于含伪结的二级结构,现有算法无法得到很好的预测结果:要么无法预测,要么算法复杂度过高、预测精度比较低。本文针对这一问题,分别从建立更可靠的协变信息计算模型和提出更有效的启发式算法两个方面对同源RNA序列的公共结构预测算法进行研究,在保持较低复杂度的条件下,本文提出的算法提高了含伪结RNA二级结构的预测精度,并且可预测出多个次优结构。具体来讲,本文主要包括以下两部分的内容和结果:一、提出了一种基于堆积协变信息和最小自由能的同源RNA含伪结二级结构迭代预测算法。该算法在Ifold算法基础上,考虑相邻碱基对之间的相互作用对协变信息的影响,引入堆积协变信息计算模型,并结合最小自由能通过逐步迭代求得含伪结的RNA二级结构。数值实验表明,该算法能正确预测伪结,其平均敏感性和特异性优于现有算法,并且在堆积协变信息的权重因子比值为5:1时,预测性能达到最优。二、提出了一种预测同源RNA多个次优公共结构的种子集合扩展算法。该算法将HotKnots算法的种子集合扩展方法应用于同源RNA公共结构预测,组合多个种子形成种子集合并扩展得到多个次优公共结构。数值实验表明,该算法预测得到的多个次优结构比较接近真实结构,并可从中分析出稳定的子结构。其平均敏感性和特异性要优于其它现有算法,其时间复杂度高于迭代算法,而远低于动态规划算法。

【Abstract】 The prediction of RNA secondary structure is a hotspot in RNA research. Many secondary structure prediction methods have been presented and rich results have been achieved. But most of these methods cann’t properly deal well with pseudoknots prediction, either of relatively high complexity or of low accuracy. On this issue, this thesis studies a more reliable covariance model and a more effective heuristic algorithm for the consensus structure prediction of homology RNA sequences. With the relatively low complexity, the algorithms presented in this thesis improve the accuracy of pseudoknots prediction and provide multiple suboptimal structures. The thesis mainly includes two sections of following contents and conclusions.(1) Based on stacking covariance and minimum free energy, the thesis presents an iteration algorithm to predict the consensus structure with pseudoknots of homology RNA sequences. With emphasis on the impact of neighbour base pairs on covariance, the algorithm introduces a model of stacking covariance into Ifold and combines with minimum free energy to assess RNA secondary structure with pseudoknots though iterations. The numerical test shows that this algorithm can correctly predict pseudoknots, with the mean sensitivity and specificity better than that of other algorithms. The performance of the algorithm achieves the best result when the factor of stacking covariance is 5:1.(2) A seed set expansion algorithm for predicting multiple suboptimal consensus structures of homology RNA sequences is presented. The algorithm applies seed set expansion algorithm derived from HotKnots to predict consensus structure of homology RNA sequences. The seed sets are made up of seeds and expanded to multiple suboptimal consensus structures. The numerical test shows that multiple suboptimal structures are all close to the reference structures and some stable substructures can be achieved. The mean sensitivity and specificity of the algorithm are better than that of the reference algorithms and the time complexity is more than that of iteration algorithms and far less than that of dynamic programming algorithms.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络