节点文献

密码对的使用与基因组进化

Codon-Pair Usage and Genome Evolution

【作者】 王芳平

【导师】 李宏;

【作者基本信息】 内蒙古大学 , 理论物理, 2009, 博士

【摘要】 密码子水平的生物信息学分析是研究基因组进化、蛋白质功能以及遗传和环境相互作用等课题中的一个重要环节。已知同义密码子的使用是非随机的。正如密码子的使用一样,两个紧邻的密码子,即密码对的使用也是高度偏好的,这种偏置现象在原核和真核生物中都广泛存在。为了探寻基因组中密码对使用的进化约束,本文分析了不同进化水平生物基因组密码对使用的规律,主要研究结果如下:1.以10种真核、60种细菌和40种古菌生物基因组为样本,分析了编码序列中密码对和基因间序列中三联体对的相对模式数随频数的分布(DNM),验证了这种分布符合Γ(α,β)分布;通过研究Γ(α,β)分布的形状参数α值,发现其与生物基因组进化存在明显的相关性;编码序列与基因间序列的进化方式截然不同。对编码序列,从古菌、细菌到真核生物α值逐渐增大,即α值将生物分成三类:古菌,细菌,真核生物。对基因间序列,α值将生物分成两类:一类是细菌,另一类是古菌和真核生物。这个结果显示密码对上下文关系包含了生物进化的信息,暗示真核生物、细菌和古菌在调节基因组一级结构进化压力方面存在基本区别。2.提出了一种以密码对使用偏好性和密码对中二核苷酸频率分别构建系统发育树的基因组相似性分析方法。发现以40种模式生物基因组中密码对的二核苷酸频率构建的系统发育树,明显将生物按进化分成三类,即细菌,古菌,真核生物;用密码对使用偏好性为指标构建的系统发育树与基于密码对中二核苷酸频率的系统发育树基本一致。结果表明反映生命进化信息的密码对中二核苷酸组分是密码对偏好的决定因素之一。3.分析了基因组组分极其偏向的厌氧性粘菌和立克次氏体基因组中密码对的使用。发现它们前导链与滞后链密码对的使用偏好性存在差异。这表明密码对的搭配受到链的特异性影响。这些特性可能包括:基因方向性偏好、密码子使用偏好、密码子的前后文关系等。因此,造成以上两物种DNA双链间密码对使用不对称的原因可能是DNA链特异的突变偏好性和在复制、转录、翻译水平上的自然选择约束。4.鉴于伽玛分布的形状参数α值与基因组进化存在相关性,首先,以5种真核、15种细菌和10种古菌生物基因组为样本,对密码对使用偏好性指标,r与密码对随基因组进化的指标α之间作相关性分析,发现部分密码对的r值与α值之间有显著的线性关系。其次,分析了密码子第三位点与紧邻密码子第一位点的二核苷酸(cP3cA1)使用,结果表明这两个位点二核苷酸使用有显著差异。最后,分析了三类生物中密码对的偏好与稀有模式,发现它们都有各自偏好与稀有的密码对模式。以上结果进一步肯定了密码对的使用与基因组进化存在相关性。5.全面分析了厌氧性粘菌(Anaeromyxobacterdehalogenans2N-C)基因组中密码对的使用,发现其密码对的使用有很强的偏置,在全基因组中有5.2%的密码对模式是缺失的。分析结果表明其密码对的偏好性至少可能是三个方面的压力的结果:基因组局部及整体的GC含量,密码对中二核苷酸的组分,二肽的保守水平。

【Abstract】 Codon analysis and its application in bioinformatics and evolutionary studies are important issues for investigating the genome evolution,protein function and interaction between genetics and environment.It is well known that synonymous codon usage is non-random.Codon-pair usage,like codon usage,has also been found to be highly biased. The vast majority of prokaryotic and eukaryotic species have a non-random codon-pair usage.In this dissertation,in order to demonstrate possible evolutionary constraints that shape codon-pair context,we investigated the codon-pair usage in different evolutionary level genomes of organisms.The main contributions are summarized as follows:1.The distributions of numbers of modes(DNM) of codon-pairs in protein coding sequences(CDSs) and the frequency of base triplet pairs in intergenic sequences(IGSs) are analyzed in 110 fully sequenced genomes.We propose that these distributions are in accordance with a gamma distribution.By studying the shape parameterαvalue of gamma distribution a distinct relation between theαvalue and the genome evolution is obtained.The modes of evolution for protein coding sequences and intergenic sequences are significantly different.For codon-pairs in CDSs,theαvalue increases in the order Archaea, Bacteria,and Eukaryota,and divides the species into three evolutionary groups,Archaea, Bacteria and Eukaryota.For triplet pairs in IGSs,on the other hand,theαvalue classifies the species into two groups,one is Bacteria and the other is Archaea and Eukaryota.The findings indicate that the codon-pair contexts contain biologic evolution information,and suggest the existence of fundamental differences of evolutional constraints imposed on CDSs and IGSs among Archaea,Bacteria,and Eukaryota.2.Based on the codon-pairs usage,a method of similarity analysis of genomes,which could be used to construct phylogenetic trees using the codon-pair usage bias and the dinucleotide frequencies within codon-pairs,is proposed.A phylogenetic tree that is constructed using the dinucleotides frequencies within codon-pairs in 40 mode organisms shows that the organisms are apparently divided into three evolutionary groups,Bacteria, Archaea,and Eukaryota.Another phylogenetic tree constructed using the index reflecting codon-pair usage bias is consistent with the phylogenetic tree constructed based on the dinucleotides frequencies within codon-pairs.Our results indicate that the component of dinucleotides within codon-pairs that reflects information of life evolution is one of the determinants of codon-pair bias.3.The patterns of codon-pair usage in the genomes of Rickettsiabellii and Anaeromyxobacterdehalogenans2CP-C that have extremely biased genomic compositions are analyzed.The results show significant differences of codon-pair usage bias between the leading and the lagging strands,suggesting that codon-pairing is influenced by strand-specific features.The strand-specific features may include the biased codon usage, gene orientation bias,context-dependent codon bias etc.Therefore,asymmetry of codon-pair usage between DNA double strands in above two genomes seems to be the result of strand-specific mutational biases and natural selection probably acting at the levels of replication,transcription and translation.4.In view of the value of the shape parameterαof gamma distribution is related with genome evolution,firstly,the linear regression between the r values of codon-pair and the parameterαvalues of gamma distribution is analysed in ten archaea,fifteen bacteria and five eukaryotes genomes.The results show that two parameters have a significant linear correlation for part of codon-pairs.Secondly,the usage of dinucleotides composed of the third position nucleotide and the first position nucleotide(cP3cA1) within codon-pair is analyzed,and the result indicates that the usage of cP3cA1 is significant biased.Finally,the modes of preferred and rejected codon-pairs are analyzed in three domains of life,and it is found that the modes of preferred and rejected codon-pairs are different from three domains. The above results confirm again that codon-pair usage is associated with genome evolution.5.The codon-pair usage is analyzed in the genome of Anaeromyxobacter dehalogenans2N-C.It is found that the codon-pair usage is highly biased,and about 5.2% modes of codon-piars are absent in the genome.Our analysis shows that the pattern of codon-paring in the genome could be the result of at least three different forces:(ⅰ) the local and total genome GC content,(ⅱ) composition of dinucleotides of codon-pair,and(ⅲ) the level of dipeptides conservation.

  • 【网络出版投稿人】 内蒙古大学
  • 【网络出版年期】2010年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络