节点文献

RNA二级结构预测方法研究

The Study on Methods of RNA Sceondary Structure Prediction

【作者】 董浩

【导师】 刘元宁;

【作者基本信息】 吉林大学 , 计算机科学与技术, 2011, 博士

【摘要】 随着对RNA(Ribonucleic Acid)研究的逐步深入,RNA在进行、遗传过程中的重要作用也越来越显著。RNA分子不仅充当着生物细胞中遗传信息的载体,还具有一系列重要的功能,如催化RNA剪接,加工和修饰RNA前体,调控基因表达等,这也促使了人们对RNA功能进行深入研究。而RNA的功能与结构是密切相关的,因此,通过研究RNA的二级结构,进而深入挖掘、阐述其功能就成为分子生物学中的重要研究课题。由于使用传统的实验手段(如X射线晶体衍射和核磁共振)去测定RNA的晶体结构虽然比较精确可靠,但代价昂贵,且费时费力。所以,借助于计算机实现的各种算法对RNA二级结构进行预测就成为当前国内外公认的主要方法。RNA二级结构预测方法经过近30年的研究,到目前为止,已经有众多的算法。这些算法有的已经非常成熟,例如最小自由能算法,其预测精确度有时能达到90 %以上,但是它不能预测RNA假结。而目前的众多其它预测算法也大都各自存在着问题,如时间复杂度高,对序列的长度有限制等等。因此,对RNA二级结构预测方法的研究仍然是RNA研究中的重点课题。本文正是在这种环境下,对RNA二级结构预测方法进行深入研究。论文对目前的RNA二级结构预测方法进行了分析、总结,然后归纳为四类:(1)比较序列分析方法(2)动态规划算法(3)组合优化算法(4)启发式算法。通过对这四类方法的研究、分析、比较,论文找到了新的预测方法的研究思路,为本文工作的完成奠定了坚实的理论基础。首先,本文研究了马尔可夫链在RNA二级结构预测中的应用,提出了基于马尔可夫链的RNA二级结构预测新方法。根据自由能,构建马尔可夫链的转移概率矩阵,进而构建RNA-ML,来寻找自由能最小的RNA二级结构。论文从公用数据库(Genomic tRNA Database)中选取六条tRNA序列进行预测,将其预测结果和目前著名软件Mfold和RNAStructure的预测结果进行比较。实验结果表明,本文建立的RNA-ML优于Mfold,对于单条序列与RNAstructure接近。同时,本方法降低了时间复杂度,提高了敏感性和特异性,对trna序列执行起来速度较快,也可以应用于较长的RNA序列,弥补了大部分方法的预测时间随着序列长度增加成立方甚至四次方增长的缺陷。其次,本文研究了隐马尔可夫模型在RNA二级结构预测中的应用,提出了基于隐马尔可夫模型的RNA二级结构预测新方法。以最小自由能为基础,建立各茎区间的转移概率矩阵、观察值概率矩阵,进而构建RNA-HMM,来寻找自由能最小的RNA二级结构。论文选取PseudoBase中的6条结构相对较复杂的RNA序列进行预测,将其预测结果和pknotsRG软件预测结果进行比较。实验结果表明,本方法的结果准确率比pknotsRG有所提高,通用性比较好。同时,也缩短了预测时间,提高了敏感性和特异性。最后,本文研究了粒子群算法在RNA二级结构预测中的应用,提出了基于粒子群算法的RNA二级结构预测方法。结合PSO、最小自由能、被选择茎区的数量和平均长度,本文设计了一个新的适应度函数,建立了IPSO。论文分别用RNAPredict,H-Helix PSO和IPSO进行RNA二级结构预测,进而来比较它们RNA二级结构的自由能。结果表明,用IPSO方法预测到的最优茎区组合的自由能低于其它方法,能够找到更为稳定的二级结构,对于长序列IPSO的性能优势更为显著,而且具有较快的收敛速度,通过较少的迭代就可以找到更好的二级结构。论文又将标准粒子群优化算法(SPSO)、标准遗传算法(SGA)、蚁群算法(ACO)和IPSO方法的预测结果进行了比较。结果表明,由于高效的目标函数,IPSO的性能明显高于其它三种方法。为了验证IPSO方法在RNA二级结构预测中的有效性,本文将IPSO、Mfold和RnaPredict的预测结果进行了比较。结果表明:IPSO在其中三条序列上的敏感性和特异性高于Mfold,而在其余两个序列的测试结果低于Mfold,IPSO方法的在全部序列上的敏感性和特异性均高于RnaPredict,这也证明了本文所设计的目标函数是可行的、更有效的。

【Abstract】 With the gradual deepening study on RNA (Ribonucleic Acid), the important role during the genetic process of RNA is also increasingly significant. The RNA molecules serve not only as a carrier of genetic information in living cells, but also has a number of important functions, such as catalysising RNA splice, processing and modifying the precursors of RNA, regulating the gene expression and so on, it is that which encourages people to do in-depth study of the RNA function. While the RNA functions and structures are closely related, therefore, through the study on the structure of the RNA molecular to fing and describe its function has become an important fleld of research of the Molecular Biology. Because which that uses traditional experimental methods (such as X-ray crystallography and NMR) to determine the crystal structure of RNA is relatively accurate and reliable, but it is expensive and time-consuming. Therefore, it is recognized as the main method at home and abroad which predicts the RNA structure by means of the various algorithms that computer realized.The methods of RNA secondary structure prediction have been studied nearly 30 years, and now there are already many mature algorithms. Some algorithms have been able to achieve high accuracy, such as the algorithm of minimum free energy, the prediction accuracy of which can sometimes reach over 90%, but it can not predict RNA pseudoknot. At present, many other prediction algorithms exist mostly their own problems, such as high time complexity, the limit of the length of the sequence and so on. Therefore, the research of the methods of RNA secondary structure prediction is still the important subjects in the RNA study.In this environment, the paper studys the methods of the RNA secondary structure prediction in depth. The Paper analyzed and summarized the current methods of RNA secondary structure prediction, and then grouped them into four categories: (1) the methods of Comparative sequence analysis (2) the Dynamic programming algorithm (3) the Combinatorial optimization algorithm (4) the Heuristic algorithm. Through the research, analysis and comparison of the four methods, the paper found the research idea of the new prediction method, which has laid a solid theoretical foundation for the completion of the paper work. Firstly, the paper studyed the application of Markov chain in the RNA secondary structure prediction, and proposed the new method of the RNA secondary structure prediction which was based on Markov chain. According to the free energy, the paper build transition probability matrix of Markov chain, and then build RNA-ML, which was used to find the RNA secondary structure with minimum free energy. The paper selected six tRNA sequences from the public database (Genomic tRNA Database) to predict, and compared the results of its prediction with the results of famous software Mfold and RNAStructure. Experimental results showed that the RNA-ML was better than Mfold, and was closer to RNAstructure for a single sequence. Besides, this approach reduced the time complexity, improved the sensitivity and specificity and it executed faster for trna sequences, and it could be also used for a longer RNA sequence, meanwhile, it made up the defects that the time of majority of prediction methods increased to the growth of cubic or puartic with the growth of sequence length.Secondly, the paper studied the application of the Hidden Markov Model in RNA secondary structure prediction and proposed the new method of RNA secondary structure prediction based on Hidden Markov Model. Based on the minimum free energy, the paper established transition probability matrix of the stems and probability matrix of observations, then the paper constructed RNA-HMM which was used to find the RNA secondary structure with minimum free energy. The paper selected 6 RNA sequences with relatively complex structure in the PseudoBase to predict, the prediction results were compared with the results of pknotsRG software. Experimental results showed that the result accuracy of this method was higher than pknotsRG, and the versatility was better than pknotsRG. Besides that, this method cut down the prdicition time, and improved the sensitivity and specificity.Finally,the paper studied the application of the Particle Swarm Optimization algorithm in RNA secondary structure prediction and proposed a new method of the RNA secondary structure prediction based on Particle Swarm Optimization. The paper designed a new fitness function and established the IPSO which was combined with PSO, the minimum free energy, the number of the selected stems and the average length of the selected stems. The paper used RNAPredict, H-Helix PSO and IPSO to predict the RNA secondary structure, and then compared their free energy of RNA secondary structure with each other. The results showed that the free energy of the optimal stem combination predicted by IPSO was lower than that predicted by other methods, and the IPSO could find a more stable secondary structure, and the performance advantages of IPSO for a long sequence was more significant, and it Could find a better secondary structure with fewer iterations , because it had a faster convergence. The paper compared the prediction results of the IPSO with the prediction results of the standard PSO (SPSO), the standard genetic algorithm (SGA), and the ant colony optimization (ACO). The results showed that the IPSO’s performance was significantly higher than the other three methods because of the highly efficient objective function. In order to verify the effectiveness of IPSO in the prediction of RNA secondary structure, the paper compared the prediction results of IPSO with the prediction results of Mfold and RnaPredict. The results showed that the sensitivity and specificity of IPSO were higher Mfold for three of the sequences, but the test results were lower than Mfold for the other two sequences, and the sensitivity and specificity of IPSO were higher than RnaPredict for all the sequences, that also proved that the objective function designed in the paper was feasible and more effective.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2011年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络