节点文献

RNA二级结构预测算法的研究

Research on Algorithms of RNA Secondary Structure Prediction

【作者】 何静媛

【导师】 何中市;

【作者基本信息】 重庆大学 , 计算机系统结构, 2009, 博士

【摘要】 RNA(Ribonucleic Acid,RNA)分子在生物细胞中不仅充当着遗传信息的载体和传递工具,还具有催化RNA的剪接,加工和修饰RNA前体,调控基因表达和生物体的生长发育等一系列重要的功能,而功能与结构是密切相关的,因此对RNA分子结构的研究就成为分子生物学的一个重要领域。由于RNA分子具有降解速度快,难以结晶等特点,通过X射线晶体衍射和核磁共振等实验方法去测定RNA分子的立体结构花费的成本高、时间长,虽然测得的结果精确可靠,可是面对当前海量的生物序列,实验方法显然跟不上要求,因此RNA二级结构预测就成为研究RNA分子结构的主要手段。RNA二级结构预测是指借助于计算机手段和各种数学方法从理论上去预测RNA的空间结构,可为揭示RNA结构与功能的关系提供重要信息,大大提高认识RNA空间结构的效率。论文对目前主流的RNA二级结构预测算法的理论和实现方法进行了细致的研究。通过对基于热力学的预测方法(包括Zuker的最小自由能算法、遗传模拟退火算法、Hopfield神经网络方法、免疫粒子群算法)和比较序列分析方法(协同变异预测模型、随机上下文无关语法预测)以及基于机器学习的分类预测方法的分析,对这些算法存在的优缺点进行了比较研究,总结出了RNA结构预测方法发展的趋势和要求,为本文的预测算法奠定了理论和实验基础。首先论文分析了人工鱼群智能算法在优化问题中的优势和不足,并针对基本人工鱼群算法在解决离散问题的过程中存在的的缺陷进行了相应改进,首次将鱼群算法应用到RNA二级结构预测问题中,建立了一种基于人工鱼群算法的最小自由能算法模型。在对算法编码实现时,采用集合表示状态点,能有效地缩小搜索空间,有利于算法在较短时间内找到目标解。仿真实验与传统的基于最小自由能的相关算法进行了比较研究,结果表明,使用改进鱼群算法进行RNA序列的二级结构预测能获得较理想的预测效果,能有效减少计算量、节省计算时间,特别当待测序列长度大于500时,鱼群算法在收敛速度上有着较明显优势。其次,研究了粒子群优化算法在组合优化问题中的应用背景,针对基本粒子群算法的早熟收敛,容易陷入局部最优且搜索精度不高等缺点,进行了相应的改进,提出了局部精英粒子群算法,在该算法中,通过改变粒子的邻居拓扑结构,使每个粒子拥有固定的局部邻居,每次迭代都会根据自身在邻居中的地位和状态以及历史最优值来调整下一步的状态。由于有效地保持粒子的多样性,使得算法有较好地跳出局部极值的特性。本文根据局部精英粒子群算法的思想构建了一套基于最小自由能思想的RNA二级结构预测模型。在对算法进行编码时,使用集合来表示粒子的状态,巧妙地将粒子运动的速度和状态函数使用集合之间的运算来重载,避免了传统粒子群算法参数选择的烦恼。实验数据有力地支持了改进后的粒子群算法和新的粒子运动状态编码方式。第三,通过扩展NSSEL(New Secondary Structure Element Labels,NSSEL)标签,创建了一套能够描述伪结结构信息的eNSSEL(extended NSSEL,eNSSEL)标签。一条RNA分子序列中的所有碱基都可以使用eNSSEL标签进行标记,从另一个角度来理解,即:任意一个碱基都可以被分类为某一个标签,因此,一条原始的RNA分子序列能与一条eNSSEL标签序列一一对应。由于eNSSEL标签携带了结构信息,因此,对于某一个RNA分子而言,只要得到其对应的标签序列,就可以知道其二级结构的组成。根据该思想,建立了基于SVMs(support vector machines,SVMs)的分类预测模型。该模型通过有效训练后,在可接受的预测精度范围里具有较低的计算复杂度,能有效地解决传统算法中存在的计算复杂性问题,为预测长链分子提供了一种全新的思路。

【Abstract】 RNA can act as a carrier of genetic information, a catalyst of biochemical reactions, an adapter molecule in protein synthesis, and a structural molecule in cellular organelles.The function of a RNA molecule has a close relation with its secondary structure. So the research work about RNA secondary structure has become more and more important.The Most accurate structure can be determined by X-ray diffraction or nuclear magnetic resonance,but this is difficult because not only it is expensive and slow but also most RNAs can not be crystallized currently.Obviously it is necessary to study RNA secondary structure by means of some prediction algorithms based on computation theory,these prediction algorithms can raise efficiency greatly for scientists to know RNA secondary structure.Firstly,some major algorithms and theories for RNA secondary structure prediction are introduced in this dissertation,they include the methods based on thermodynamic energy minimization principle(such as Zuker’s mfold mehod,Genetic simulated annealing algorithm,Hopfield network method,and Immune particle swarm algorithm), phylogenetic comparative methods(covariance mutation prediction model, stochastic context free grammar algorithm),and classification method with BP neural network. The author analyzes the advantage and limitation of these algorithms by comparing them,and then gives a summary of development demand and trendency for RNA secondary structure prediction.Nextly,a modified artificial fish swarm optimization algorithm is presented in this paper,which can self-adaptively change some parameter’s values , improve the behavior of fish and reduce the search space,therefore it can rapidly close to the global best result by avoiding the blindness of searching at the later stage owing to the improvement on basic fish swarm algorithm.At the same time,a RNA secondary structure prediction model based on modified fish swarm algorithm is presented,state space that is described with sets in our model largely shrinks the search space,so,when compared with SetPSO(Particle Swarm optimization with Set,SetPSO) method and GSA(Genetic Simulated Annealing,GSA)method,our algorithm not only is effective but also has much lower runtime cost.Thirdly, the PSO(Partcle Swarm Optimization,PSO)algorithm is analysed in detail.There are some defects for its application in solving combinatorial optimization problems,for example,the PSO algorithm is prone to stay in local optimal results instead of the global optimal objective answer due to it’s rapid convergence.Aiming at this drawback,an improved algorithm called LEPSO(Local Elite Particle Swarm Optimization,LEPSO) algorithm is proposed.In the new algorithm,every particle has fixed neighbours, and adjusts its state for next step according to its past optimal value and the local impact factor of its neighbours in every iteration. Because of keeping diversiform directions of movement for each particle,the new algorithm can easily find global best result. The author realizes a RNA secondary structure prediction model with LEPSO algorithm,some new operators are defined to match the velocity and position formula,they skillfully avoid assigning values for all parameters,so that make the algorithm easily understandable.Lastly,the author sets up a set of labels which is called as eNSSEL(extended New Secondary Structure Element Label,eNSSEL) labels,these labels not only can express the simple stem-loop motif in a RNA secondary structure,but also can describe pseudoknots structure.Each base in a RNA molecule can be marked by a kind of label,so a RNA sequence can be converted to a label sequence. Corresponsively,a label sequence can be reverted to RNA secondary structures with a simple method.If only a label sequence for a RNA molecule has been known, the secondary structure for the molecule can be acquired by reversion.When a label is regarded as a structure type associated with a base in RNA molecule,the problem for RNA secondary structure prediction can be considered a classification problem. Consequently a SVMs(Support Vector Machines,SVMs) model for RNA secondary structure prediction is created in terms of classification theory.Experiment shows that this model not only can solve high runtime complexity problem of traditional algorithms, but also can efficiently predict long RNA sequences which are difficult with traditional folding algorithms.it gives a new thought of RNA secondary structure prediction for long sequence with pseudoknots.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2009年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络