节点文献

新型表示模式下的DNA序列和RNA二级结构分析方法研究

The Research on Analysis Methods of DNA Sequences and RNA Secondary Structures Based on New Representations Models

【作者】 曹智

【导师】 李仁发;

【作者基本信息】 湖南大学 , 计算机应用技术, 2010, 博士

【摘要】 随着人类基因组计划的开展,以及对各种序列、结构和功能的研究,产生了庞大的生物数据。对这些生物数据进行科学的分析、处理推动了生物信息学的发展。序列和结构的相似性分析是生物信息学的基础,通过相似性分析获得的大量的序列或结构信息可以用来推断基因的结构、功能和进化关系,因此生物序列或结构分析研究已成为生物信息学领域中一个非常重要的研究课题。序列和结构的分析主要包括相似性分析、突变分析、进化分析和功能分析。而突变分析、进化分析和功能分析都是以序列和结构的相似性分析为基础的,因此本文将基于DNA序列和RNA二级结构的新型表达模式给出DNA序列和RNA二级结构的相似性分析方法,进而给出点突变分析和进化树构建方法。论文在综述了序列和结构分析方法的研究现状的基础上,对基于双核苷酸和编码序列的图形表示方法,RNA二级结构的编码方式,基于RNA二级结构编码序列的点突变分析和结构比对方法,基于序列图形表示的相似性分析方法和系统进化树构建方法进行了系统的研究,本文取得的研究成果主要有:1.提出了一种基于核苷酸二联体理化性质的DNA序列的3D图形表示,并给出基于图形表示的序列相似性分析方法。碱基之间的相互作用对序列所决定的结构和功能起着非常重要的作用。为了提供一种简单直接地展示序列信息的方法,本课题在分析了DNA序列的相邻双重核苷的性质的基础上,给出了一种DNA序列的3D曲线族表示方法,并基于几何中心所构建的协方差矩阵,给出了序列间非相似性的一种度量方式。实验结果表明,该方法能准确地度量序列间的相似性。准确的相似距离矩阵的计算将有助于推断物种之间的亲缘关系,有助于找出各种物种之间特别是人类与其他物种之间的联系。并且以某种生物为研究对象来研究人类的各种生理生化机理。2.提出了一种基于编码方式的序列比较方法和序列相似性分析方法。根据DNA序列的编码规则,本课题给出了解决序列比较中四个基本问题的方法,同时,基于编码序列给出了一种DNA序列的3D表示,并进行序列相似性分析。序列编码方式简单直接地展示了序列信息,有助于更好地实现突变分析可视化,从而推断疾病发生的机理。序列的编码方式也为序列比较提供了一种很好的数学模型,易于发现序列间的相似性和差异性,便于基因的检测和基因功能区的预测。3.提出了一种RNA二级结构的编码方法,给出了基于编码序列和异或操作的点突变分析和结构比对方法。针对现有的RNA二级结构预测和功能预测算法因多序列比对而具有鲁棒性和结构数不敏感等问题,针对RNA二级结构表示法中主要的高复杂性和退化问题,本文分别给出了RNA二级结构一种简单和扩展的编码方式。该方式能很好地区分自由基和基对,能区分含假结在内的不同结构类。同时,基于简单的三位编码方式,给出了RNA二级结构比较方法和点突变分析方法。基于扩展的编码方式,给出了一种新型的结构比对方法,并通过实验验证了该方法的有效性。4.分别提出了一种基于模糊聚类的进化树构建算法和基于最小生成树算法的进化树构建算法。本文分别以已获得的相似性和非相似性矩阵为研究基础,给出了基于模糊聚类的进化树构建算法。该方法用相似性矩阵替代了距离矩阵,并在系统进化树的构建过程中相似矩阵不需要重新调整。很好地体现了物种之间的关系,并降低了时间复杂度。同时,给出了一种基于完全图的最小生成树算法并应用于进化树的构建,也取得了较好的效果。

【Abstract】 With the rapid development of Human Genome Project (HGP) and advancement of gene sequence, structure and functioning study, more and more bioinformatics data are generated. The enormous data are typically processed efficiently by automated modern analysis approaches. Similarity analysis provides the sequential and structured information to infer or estimate the structure, functioning and evolution relation, hence becomes a fundamental study subject of bioinformatics. The sequence or structure analysis consists of similarity analysis, mutation analysis, phylogenetic analysis and function analysis, which are based on similarity analysis of sequence and structure. Therefore, the dissertation proposed the methods of similarity analysis of DNA sequence and RNA secondary structure based on the new representation models of DNA sequence and RNA secondary structure, it proposed methods of mutation analysis and construction of phylogenetic tree at the same time.The dissertation reviewed the recent advances of sequential and structure analysis advances first, and then study graphical representation of DNA sequences based on dual nucleotides, numerical coding method of RNA secondary structure, the methods of the analysis of mutation and structure alignment based on the coding sequences of RNA secondary structure, sequence similarity analysis based on graphical representation and construction of phylogenetic tree.The main contents are listed as follows:a) The author proposed a 3D graphical representation of DNA sequences based on physical and chemical properties of dual nucleotides, and gave a similarity analysis. It is known that the dependency and interaction between bases are very important for determining the structure and function of the sequences. To give a simple and intrinsic visualization of gene sequences, the dissertation proposed a 3D curve representation of DNA sequences with a dissimilarity measure of sequences based on geometric center covariance matrixes. The experiment showed the proposed approach can measure the similarity of sequences precisely which helps further infer the relation and relationship of species, especially those between human and other species. It may help discover human mechanics based on studies on other species as well.b) The author proposed a sequence comparison method and similarity analysis method based on coding scheme. According to DNA coding principle, the dissertation proposed a method that solved four basic problems. It could make analysis of similarity between DNA sequences. The coding method of sequence, which demonstrates sequences efficiently and makes the analysis of mutation visible, helps find out mechanism of diseases. Besides, the coding method provides a better mathematical model to figure out the similarity or dissimilarity between DNA sequences, in the sense that it improves genetic test and the prediction of gene functions.c) The author proposed a coding scheme of RNA secondary structure, and gave mutation analysis and structure comparison based on coding and XOR operator. The representation of RNA secondary structure is very complex and easily degenerated. The proposed coding method and its extension can well separate the free base and base pair, and distinguish the different structures including pseudoknot. Based on three digits coding, the dissertation presented RNA secondary structure comparison method, analysis method of mutation. And the dissertation proposed a novel structure comparison method based on coding rules. The experiment showed the excellence of the method.d) The author proposed two novel phylogenetic tree construction methods based on fuzzing clustering and minimum spanning tree that essentially make use of the proposed similarity and dissimilarity matrix.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络