节点文献

含假结RNA二级结构图的语法及拓扑分类

RNA Secondary Structure Graphs with Pseudoknots: Grammars and Topological Classifications

【作者】 高世乐

【导师】 丁克诠;

【作者基本信息】 大连理工大学 , 计算机应用, 2008, 博士

【摘要】 RNA的生物学功能很大程度上决定于其三维空间结构。空间结构一般通过X射线晶体衍射和核磁共振等实验方法获得,但是实验方法技术难度高、耗资大且并非总是可行。因此理论预测RNA空间结构就非常必要,它可以提高认识RNA空间结构的效率,还可以指引实验研究的方向。RNA二级结构预测是其空间结构预测的关键。目前,最小自由能法是预测RNA二级结构最主要的方法,但二级结构内部假结的存在使得该方法是NP完全的,这类方法只能考虑具有一定结构特征的假结,从而将时间复杂度控制在多项式范围内。视结构特征的不同,复杂度从O(n~3)到O(n~6)不等,复杂度越高能预测的目标集也越广,目前不存在速度快、目标集广的“好”方法。本文以时间复杂度为O(n~6)的Rivas与Eddy算法的目标集(R&E类)为全集,从文法和语言的角度提出了R&E类及其子类的弧图语法,并从拓扑学的角度将其分类,具体内容包括:1.提出了弧图语法概念。RNA二级结构弧图是一种特殊的二级结构图,从最简单的弧图开始,按照一定的替换规则对弧图的小部件用较大的部件替换,从而得到一个弧图类。那些体现结构特征的最简单的弧图称为初始弧图,那些替换规则称为重写规则,初始弧图和重写规则两类要素构成弧图语法。给定弧图语法就决定着一个弧图类,该类称为弧图语言。弧图语法提供了一种把数量无限的弧图映射成为数量有限的语法要素的研究方法,且具有易扩展性。2.给出了5个二级结构算法类的弧图语法。本文首先设计出R&E类的弧图语法,以R&E类弧图语法的要素为全集,给出其自小至大的子集PKF类、L&P类、D&P类、A&U类的弧图语法,这几个类间具有真包含关系,这种关系直观地从他们的弧图语法要素中体现出来。通过研究这5个二级结构类间的关系以及代表元素使全面掌握各类的内容成为可能。3.得到了平面R&E弧图的语法判据。直观地以平面图的形式展示二级结构有着很广泛的需求,本文在他人研究的基础上在更大范围——R&E类内给出基于弧图语法的判别依据,同时给出了平面嵌入算法。通过可平面R&E类的弧图语法,在R&E类内部完全解决了可平面弧图的平面嵌入问题。在R&E类外部也进行了初步的探索,得到一些有益的例证。4.实现了R&E弧图的书嵌入分类。RNA二级结构图的书嵌入自1999年提出以来由于其高时间复杂性(NP完全)而一直停留在概念上。本文以R&E类的弧图语法为基础,以弧图的交叉关系图为桥梁,最终通过交叉关系图的极大团覆盖求出任意R&E煌嫉氖榍度胧⒏銮度敕桨?将传统意义上的NP完全问题在R&E类上用多项式时间加以解决。5.改进了计算R&E弧图亏格的方法。本文基于R&E类的弧图语法,提出弧图亏格计算的动态快速算法,该方法把二级结构图的有向闭曲面分类法中计算亏格的时间复杂度由线性提高到常数,在枚举小亏格弧图和计算多个相似二级结构弧图的亏格上具有较大优势。

【Abstract】 The functions of RNA moleculars in biology are mostly determined by its three-dimensional structure, which can be observed by experimental methods such as X-ray crystallography and nuclear magnetic resonance spectroscopy. However, experimental investigations often suffer from the challenge in technique, cost, and other difficulties. Therefore, theoretically predicting the space structure of RNA molecular becomes very necessary, which can improve the efficiency to realize the RNA space structures, guide the experiemtmal investigation.The secondary structure of RNA molecular is the key to predict its space structure. Up to now, minimization free energy (MFE) method is the most conventional for RNA secondary structure prediction. However, finding the MFE structure for a given RNA molecular is NP-complete for the pseudoknots which exist in RNA secondary structures, MFE methods can only control the time complexity in the polynomial level for some special kinds of pseudoknots. The time complexity, which is related to different special kinds of pseudonots, is from O(n~3) to O(n~6). The algorithms with higher time complexity can predict general target, and there do not exist a "good" algorithms with lower time complexity and general target so far. In this paper, we introduce grammars and topological classifications of Rivas and Eddy algorithm class (R&E class) based on linguistics and topology. This thesis includes:1. The concept of the grammar of arc graphs is proposed. At first, arc graph is a graph which illustrates RNA secondary structure, where the information about the nested and crossed base pairs is preserved but that of the unpaired bases is deleted. A small part of an arc graph is replaced by a larger one according to the replacement rule and a larger arc graph is obtained. These processes are continued and a class of arc graphs can be achieved. These simplest arc graphs which embody the structure characteristics are called initial arc graphs. These replacement rules are called rewriting rules. The grammar of arc graphs includes both initial arc graphs and rewriting rules. The arc graphs determined by a given grammar are called languages of arc graphs. The grammar of arc graphs provides a new method to map infinite arc graphs to finite grammar elements. The grammar of arc graphs can be easily extended to define new classes of RNA secondary structures.2. Arc graph grammars of 5 typical RNA secondary prediction algorithms classes are given. In this paper, R&E grammar is defined firstly. Taking the grammar elements of R&E as the whole class, the grammars of PKF, L&P, D&P, A&U classes are defined. Proper inclusive relations are existed between these classes. It is exhibited distinctly through the inclusive relations of the grammar elements. It is possible to grasp the content of these 5 secondary structure prediction algorithm classes through the relations among these classes and the typical elements of these classes.3. Planarity criterion of R&E arc graph is obtained. It is widely required to show RNA secondary structures as a planar. The planar graphical criterion and embedding algorithms are given based on the R&E grammar. The result of this thesis is more general than the previous works. The book embedding and the book embedding number calculation are solved completely in R&E class through the R&E grammar. Some examples are exhibited outside R&E class along with the basic exploring in this area.4. Book embedding classification of R&E arc graph is realized. Book embedding of arc graph is just a concept since it was proposed in 1999 because of its high time complexity (NP-complete). In this thesis, based on the grammar of R&E arc graph, Taking the cross relation graph of an arc graph as a bridge, the book embedding number of any arc graph is calculated through the maximum clique covering. At the same time, the book embedding number is obtained. A traditional NP-complete problem is solved with a polynomial time in R&E class.5. The method for calculating the genus of R&E arc graph is improved. Based on the grammar of R&E arc graphs, a rapid dynamic algorithm for calculating the genus of R&E arc graph is introduced. The time complexity of calculating the genus, the classification standard of directed closed sphere for RNA secondary structures, is improved from linear to constant. This algorithm can be applied to enumerate all arc graphs with small genus and calculating genus of a series of similar arc graphs. It is super to old algorithm in these two aspects.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络