节点文献

基于DNA序列的功能位点识别

Recognition of the Functional Sites Based on the DNA Sequence

【作者】 吴琴琴

【导师】 王加俊;

【作者基本信息】 苏州大学 , 信号与信息处理, 2010, 硕士

【摘要】 由于基因序列中的功能位点与基因的调控、转录紧密相关,人们对这些位点进行了广泛的分析。如何从DNA序列中准确地检测出这些功能位点成为了生物信息学中的一项长期热点。本文首先提出了一种基于熵度量的改进位置权重矩阵法,并以此方法对原核生物启动子进行识别。该方法首先运用信息论中的信息熵提取出原核生物启动子的保守位点,然后利用启动子训练集和非启动子训练集构建两个相应的改进位置权重矩阵。根据矩阵中相应于保守位点和关联片段的元素值,对测试序列进行计分,最后根据分值对测试序列进行分类。在大肠杆菌基因序列上的实验结果表明,该算法在敏感性、特异性、关联系数以及精确度方面优于现有的启动子识别算法。第二,提出了一种基于新颖模式识别技术的核小体识别算法。此技术结合了两种方法分别进行模式匹配和序列模糊性的去除。首先运用了电子技术中的镜像匹配滤波器来匹配序列中的模式信息;再运用图像处理中的概率松弛标示进行后续处理,根据位点左右的上下文信息减少或消除序列在测定过程中产生的噪声。将此技术应用到酵母基因组上,得到的核小体分布图表明该算法在识别准确率方面有显著的提高。实验结果同时也揭示出各物种之间核小体分布也许存在着一种共享的序列机制。

【Abstract】 The functional sites in the DNA sequence are widely analyzed because of their relation with the gene regulation and transcription. How to recognize these functional sites accurately based on the DNA sequence has been a topic of long-standing interest in the Bioinformatics.In this paper, a detection algorithm is firstly proposed for the prokaryotic promoters using an improved position weight matrix (PWM) method based on an entropy measure. In this method, the conservative sites of the prokaryotic promoters are extracted according to an entropy measure, and then two improved position weight matrices are constructed based on the training set. By using the values of the matrix elements in the specific columns corresponding to the extracted conservative sites, the test sequences are scored and subsequently classified. Experimental results on several datasets show that the proposed algorithm outperforms the existing ones in sensitivity, specificity, correlation coefficient and precision.Secondly we develop a novel pattern recognition based approach to identify nucleosome positions. This technique combines two methods for nucleosome pattern matching and ambiguity elimination. Firstly the matched mirror position filter is used to match the patterns in the DNA sequence, and then the probabilistic relaxation labeling, which is widely used in image processing, is used to eliminate the noise in the DNA sequence by the contextual information. We then applied this combined framework to the Saccharomyces cerevisiae (yeast) genome. The resulting nucleosome occupancy maps of the yeast show that the accuracy of our proposed algorithm has been significantly improved. Experimental results also show that maybe a kind of mechanism is shared by the nucleosome occupancy maps of different species.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2011年 01期
节点文献中: