节点文献

478种生物的密码对使用偏好性及其与翻译效率的相关性研究

Intragenic Patterns of Codon Pair Bias and Their Correlations with Translation Efficiency in 478 Organisms

【作者】 赵胜

【导师】 刘小林;

【作者基本信息】 西北农林科技大学 , 遗传学, 2011, 博士

【摘要】 根据中心法则,遗传信息的传递是由DNA到mRNA,再由mRNA到蛋白质。遗传信息在由mRNA到蛋白质的传递过程中是以三联体密码的形式传递的。每种氨基酸至少对应一个密码子,最多的有6种对应的密码子。编码同一种氨基酸的密码子称为同义密码子。人们已对不同物种的密码子使用偏好性进行了一些研究,发现不同物种在密码子的使用上存在着明显的偏好性;同一物种不同功能基因的密码子使用偏好性也存在较大的差异。61种有意密码子有3721(61×61)种不同的密码对组合。对于密码对用法的研究,早期主要集中在大肠杆菌等模式生物。这些研究结果表明,密码对的使用不是随机的,具有一定的偏好性。近年来,伴随着多种生物全基因组测序的完成,密码对的研究也进入了基因组水平。这些基因组水平上的研究近一步证实了密码对的使用偏好性是具有物种特异性的,并且这种偏好性不同于密码子的使用偏好性,但对于造成密码对使用偏好性的根源,还不是很清楚。已有的研究结果表明,密码对的使用与基因的翻译效率有关。有学者提出,蛋白质合成过程中,核糖体蛋白和密码子与反密码子对在核糖体的P位和A位上形成的空间结构影响了翻译的精确性和速率,而这种空间结构的稳定性是影响密码对使用偏好性的主要原因。基于密码对使用偏好性的生物信息学分析是研究基因表达、蛋白质翻译效率和基因组进化等课题中的一个重要环节。到目前为止,这方面的研究主要集中在研究单个基因或者基因组中所有基因的平均密码子使用偏好性。近年来的研究结果已经清晰的表明,核糖体对基因的翻译速度,在同一基因的不同区域是不同的。不同的密码对在基因序列上的排列顺序是否具有一定的规律?这些规律是否与基因不同区域的翻译速率有关系?这种关系是否是影响密码对使用偏好性的重要因素?这些问题是生物信息学和基因组学研究中极具挑战性的课题,但到目前为止还没有人研究。本论文利用基因组学和生物信息学的理论与技术,采用JAVA、Python和R等编程语言,针对不同的研究主题,分别编写了多个计算机程序,试图从涵盖细菌域(Bacteria)、古菌域(Archaea)和真核域(Eukarya)的478种生物的全基因组水平上分析密码对使用偏好性在基因序列的不同区域内的变化趋势,进而研究这些变化趋势与基因翻译效率的关系,以期揭示影响密码对非随机使用的进化因素,为基因表达和蛋白质翻译效率等方面的研究提供更多的理论基础。针对这一研究目标,我们开展了以下的研究:1.478种生物基因组水平上的密码对使用偏好性分析本研究的目的是在基因组水平上,分析478种生物的所有蛋白编码序列(coding sequence, CDS)中3721种密码对的组合模式,以期在不同的生物中找到普遍存在的密码对使用规律。我们从NCBI和USCS获取了人(Homo sapiens)、小鼠(Mus musculus)、大鼠(Rattus rattus)、牛(Bos Taurus)、果蝇(Drosophila melanogaster)、线虫(Caenorhabditis elegans)、酵母(Saccharomyces cerevisiae)、裂解酵母(Schizosaccharomyces pombe)、大肠杆菌(Escherichia coli)以及其它10种真菌(Fungi)、461种细菌和古细菌的CDS序列。针对该项研究,我们用JAVA和Python编程语言和R统计分析语言,编写了多个用于基因组水平上统计密码对使用频率的计算机程序并用数据库管理语言MySQL构建了相应的本地数据库。在所研究的478种生物中,我们分别计算了3721种密码对的使用偏好性分值(codon pair score,CPS)。密码对的CPS值越高说明该密码对在基因组上的偏好性越强。根据不同密码对的CPS值,我们首先分析了人、大鼠、小鼠、牛、果蝇、线虫、酵母、裂解酵母和大肠杆菌等9种模式生物中单个CDS序列的密码对使用偏好性(codon pair bias,CPB)。某一CDS序列的CPB值为该序列中所有密码对CPS值的算数平均值。研究结果表明,在所选取的这9种模式生物中,3721种密码对的使用具有强烈的偏好性。例如,人基因组上的17,635个CDS序列的CPB平均值为0.075,具有向正向偏移的趋势。根据基因组上3721种密码对的CPS值,针对基因组中的每一个CDS序列,我们按照CDS序列上密码对的排列顺序,构建了一个密码对偏好性分布型(CPS profile)。针对所研究的每一种生物,我们将该生物基因组中的所有CDS序列的密码对偏好性分布型分别从序列的5’和3’末端联配(aligning),并计算联配结果中的每一个密码对位点上CPS值的平均值,得到了该生物所有CDS序列的全基因组平均密码对偏好性分布型(averaged CPS profile)。分析基因组的平均密码对偏好性分布型表明,在所研究的478种生物中,有441种生物的全基因组CDS序列表现出相似的密码对偏好性变化规律,即在全基因组水平上,密码对的使用偏好性在CDS的5’末端普遍偏低,并由5’末端向3’末端逐步升高。我们将平均密码对偏好性分布型中出现的这一规律称为‘密码对斜坡’(codon pair ramp)。为了确定不同基因组中密码对斜坡的长度,我们使用sliding window法进一步分析了每种生物的平均密码对偏好性分布型。我们将平均密码对偏好性分布型的前120个密码对平均分为12个sliding window(每个sliding window包含10个连续的密码对)。通过Kolmogorov-Smirnov Test,我们比较了每个sliding window的平均CPS值与前120个密码对的平均CPS值,并将Kolmogorov-Smirnov Test中P值大于0.05时所对应的sliding window的位置定义为密码对斜坡的长度。通过这一算法,我们发现在所研究的479种生物中,有441种生物具有密码对斜坡,该密码对斜坡位于CDS序列的第20至第50个密码对之间(命名为:前密码对斜坡,head codon pair ramp),即CDS序列的前60至150个碱基之间。例如,在人基因组的CDS序列中,前40个密码对为前密码对斜坡区;该区域的平均CPS值为0.067,比前120个密码对的平均CPS值(0.072)低7﹪;而第50个密码对到第120个密码对的平均CPS值为0.076,比前120个密码对的平均CPS值高出6﹪。Kolmogorov-Smirnov Test的分析结果还表明,密码对斜坡在真核生物、细菌和古细菌中普遍存在,具有物种的特异性,但没有生物分类系统上的差别。为了进一步确定密码对斜坡的存在,我们分别计算了基因组中每一个CDS序列中前40个密码对的CPB值,并与每一个CDS序列的CPB值进行了比较。Paired t-test的比较结果表明,前40个密码对的CPB值,极显著的低于全序列的CPB值(Paired t-test, P<2.2E-16)。例如,在人基因组中,CDS序列的前40个密码对的平均CPB值为0.066,极显著的低于所有CDS序列的平均CPB值(0.075)(Paired t-test, P<2.2E-16)。通过分析全基因组的平均密码对偏好性分布型,我们还发现在所研究的478种生物中,密码对斜坡同时还存在于其中的413种生物例如人、大鼠、小鼠、牛、果蝇、线虫和大肠杆菌等的CDS序列的最后120个密码对中(命名为:后密码对斜坡,tail codon pair ramp);而在其余的69种生物例如酵母和裂解酵母等的CDS序列的最后120个密码对中,我们没有发现密码对斜坡的存在。除此之外,我们还发现在CDS序列的前120个密码对和后120个密码对中都发现密码对斜坡的413种生物中,有375种生物的前密码对斜坡的长度长于后密码对斜坡的长度。2.比较基因组密码对偏好性分布型和随机密码对偏好性分布型为了进一步证明我们所发现的密码对斜坡并不是随机的,而是生物基因组固有的内在特征,我们用R编程语言,结合Seqinr(http://seqinr.r-forge.r-project.org/)程序模块,编写了一个生成随机CDS序列的R计算机程序。利用codon randomization法和synonymous codon randomization法,我们对人、大肠杆菌和酵母这三种模式生物基因组上的每个CDS序列,分别生成了两组随机序列(每组包含50个随机序列)。Codon randomization法生成的随机序列保持了原有序列中61种有意密码子的使用频率不变,只是随机的改变CDS序列上密码对的排列顺序;而synonymous codon randomization法生成的随机序列不但保持了原有序列中61种有意密码子的使用频率不变,同时还保持了所编码的氨基酸序列不变,只是随机的改变序列上密码对的排列顺序。例如,对于人基因组中的17,635个CDS序列,我们用codon randomization法和synonymous codon randomization法分别生成了881,750个随机CDS序列。通过分析这两组随机序列,我们分别得到了人、大肠杆菌和酵母的两个随机密码对偏好性分布型(codon randomization profile和synonymous codon randomization profile)。在随机密码对偏好性分布型中,我们发现密码对的平均CPS值都是负值,说明在随机密码对偏好性分布型中出现的密码对在原有基因组中都是不常用的密码对;同时也说明原有基因组中的不同密码对出现的频率并不是随机的,即这些密码对的使用偏好性是具有物种特异性的,是基因组固有的特征。此外,在随机密码对偏好性分布型中,无论是对于前120个密码对还是后120个密码对,我们都没有发现密码对斜坡的存在。该结果也证明了我们在原有基因组中发现的密码对斜坡是生物固有的内在特征,而不是密码对在基因组上随机排列的结果。3.密码对斜坡与翻译效率的相关性研究已有研究表明,基因的密码对使用偏好性影响了基因的翻译效率。本研究的目的是利用生物信息学的方法,从基因组水平上研究密码对使用偏好性与基因翻译效率的相关性,尤其是密码对斜坡与翻译速率的相关性。我们用tRNA适应指数(tRNA adaptation index, tAI)作为度量基因翻译速率的指标。基因的tAI值表示的是该基因对于全基因组tRNA池的适应程度。基因的tAI值越高说明该基因的翻译速率也越高。我们用Java和Python编程语言,编写了多个用于基因组水平上计算tAI值的计算机程序。我们分别计算了9种模式生物(人、大鼠、小鼠、牛、线虫、果蝇、酵母、裂解酵母和大肠杆菌)基因组上的每一个CDS序列的tAI值。Spearman相关性分析的结果表明,在这9种模式生物中,CDS序列的CPB值与tAI值呈显著的相关。例如,在人的17,635个CDS序列中,CPB值与tAI值的Spearman相关系数为0.298(P<2.2E-16)。该结果表明,基因的翻译速率是影响基因密码对使用偏好性的一个重要因素。接着,我们从基因组水平上比较了选所取的9种模式生物的全基因组平均翻译速率分布型(averaged tAI profile)与平均密码对偏好性分布型(averaged CPB profile)。在人、牛、线虫、果蝇、裂解酵母和大肠杆菌的基因组CDS序列的前密码对斜坡区内,我们发现平均翻译速率分布型与平均密码对偏好性分布型呈现强烈的相关性,即基因组中CDS序列的前40个密码对的平均CPS值的变化趋势与平均tAI值的变化趋势强烈的相关。例如,在人基因组中,这种相关性高达0.651(Spearman test, P<9.177E-06)。但在基因组上密码对斜坡区以外的区域,我们没有发现这种相关性。例如,在人基因组的密码对斜坡以外的区域,CPS值与tAI值的Spearman相关系数为-0.032(P=0.776)。此外,在大鼠、小鼠以及酵母基因组的密码对斜坡中,我们也没有发现这种相关性(Spearman test, P>0.05),但分析酵母的CDS序列的前120个密码对(即前450个碱基)时,我们发现平均CPB值与平均tAI值呈现一定的相关性(Spearman test,ρ=0.242, P=0.0078)。以上的研究结果表明,在基因组的密码对斜坡中,密码对的偏好性与基因的翻译速率密切相关;非偏好使用的密码对降低了翻译的速度,进而影响了翻译的早期延长过程。这些结果也支持了基因表达的限速步骤是翻译的起始以及翻译的早期延长这一观点。4.密码对斜坡与大肠杆菌绿色荧光蛋白基因的表达水平的相关性研究本研究的目的是比较154个人工合成的大肠杆菌绿色荧光蛋白(green fluorescent protein, GFP)基因的密码对使用偏好性与其表达水平的关系,以期从已发表的的生物学实验结果中找到支持我们所得结论的证据。Plotkin等向我们提供了其2009年发表在《Science》上的论文中的154个人工合成的大肠杆菌GFP基因的DNA序列及其对应的基因表达水平数据。利用已有的Java和Python程序,我们分析了这154条GFP基因的CPB值。研究结果表明,这些GFP基因的平均CPB值为-0.098,低于大肠杆菌内源性基因的平均值(0.077)。由于这些人工合成的GFP基因上的密码对是随机排列的,在这些基因中我们没有发现密码对斜坡的存在。相关性分析表明,这些基因的CPB值与其对应的基因表达水平不存在相关性(Spearman test,ρ=-0.106, P>0.19)。当只考虑这154个GFP基因前40个密码对的CPB值时,我们发现前40个密码对的CPB值与基因的表达水平呈现显著的相关性(Spearman test,ρ=-0.256, P<0.01)。更有趣的是当只考虑这154个GFP基因中前40个密码对CPB值最高的37个基因(25﹪)时,我们发现CPB值与基因表达水平呈现显著的相关性(Spearman test,ρ=0.514, P<0.01)。该实验的结果支持了我们通过生物信息学分析得到的结论,即基因序列上局部的密码对使用偏好性,而不是全基因的密码对使用偏好性,与基因的表达水平密切相关。综上所述,本研究利用生物信息学和基因组学的理论和方法,分了478种生物全基因组密码对使用偏好性的变化趋势。我们在441种生物的全基因组CDS序列中发现了密码对斜坡的存在,即密码对的使用偏好性在CDS的5’末端普遍偏低,并由5’末端向3’末端逐步升高。这一规律在真核生、细菌和古细菌中普遍存在,具有物种的特异性,但没有生物分类系统上的差别。我们的研究还表明,在基因组的密码对斜坡中,密码对使用偏好性与基因的翻译速度密切相关;非偏好使用的密码对降低了翻译的速度,进而影响了翻译的早期延长过程。分析其他学者发表的生物学实验数据的结果也支持了这一结论。基于以上研究结果,我们认为翻译起始区域内的碱基序列包含了大量的信息,这些信息强烈的影响了蛋白质翻译的起始和翻译的早期延长过程。为开展本研究,我们编写了多个生物信息学程序,这些计算机程序都可免费提供下载,这为进一步开展相关研究打下了基础。本研究的结果对于理解密码对使用偏好性对基因表达的影响、基因序列的一维信息中蕴含的特定信号如何影响蛋白质功能和物种间进化等问题都具有一定的意义和指导作用,并为进一步开展此方面的研究提供了理论基础和新方法。

【Abstract】 It is a longstanding idea that, in most species, synonymous codons are used with different frequencies (known as codon bias) and the order with which codons are used for one protein is far from random. There are 61 sense codons, therefore there are 3721 possible codon pairs (excluding stop codon pairs). It has been established by former studies that codon pair pattern in a given genome is also nonrandom and codon pair bias is a feature of different species which is independent of codon bias known as codon pair bias (CPB). Up to now, it is still not clear why some codon pairs are used more frequently than others. It has been suggested by previous experimental analysis that a selective force on codon pair preference within coding sequences may be translation, for the fitness of tRNAs within the A and P-sites in ribosomal may influence the efficiency of translation, and codon pair bias may have a component dictated by tRNA properties, rather than simply by codon properties.Analysis of codon pair usage in different organisms and its applications in bioinformatics and evolutionary studies are important issues for investigating gene expressing and genome evolution. CPB value has been applied on individual gene or individual genome to measure codon pair bias, but never codon-pair-by-codon-pair over an entire transcriptome. In this study, by using the methods of genomics and bioinformatics, the following researches have been done:1. Analysis of codon pair bias in 478 organisms through codon-pair-by-codon-pair over entire transcriptomesThe aim of this research is to analyze codon pair bias through codon-pair-by-codon-pair across all coding sequences (CDS) in 478 organisms from all three domains of life and try to find out some general rules of codon pair usage. Consensus coding sequences (CCDS) for Homo sapiens (human) and Mus musculus (mouse) as well as coding sequences (CDS) for Rattus rattus (rat), Bos taurus (cow), Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Escherichia coli and other organisms were downloaded from NCBI and UCSC. We developed several computer programs using Java, Python and R programming languages to carry out genome-wide analysis in this study. Based on these self-made computer programs, we computed the codon pair score (CPS) for each of the 3721 possible codon pairs. The CPS for a given codon pair is measured as the natural log of the ratio of the observed over the expected frequency of this codon pair over all coding sequences in a given genome. Positive and negative CPS values correspond to statistically over- and under-represented codon pairs. The codon pair bias (CPB) for an entire CDS with N codons (not including the stop codon) was then calculated as the arithmetic mean of the individual CPSs, and a CPS profile for the i-th CDS in a given genome is the vector of all its CPS values. For a particular species, the 5’and 3’portions of all CDSs were aligned according to their start and stop positions respectively, and an averaged head (the first 120 codon pairs) and tail (the last 120 codon pairs) CPS profiles were calculated by taking the mean of the CPS values of each codon pair position in the alignment respectively.We calculated total CPB for each CDS in the human as well as in the mouse, rat, cow, D.melanogaster, C.elegans, S.cerevisia, S.pombe and E. coli genomes. Specifically, in the human genome, the CPB distribution for a set of 17,635 CDS is shifted towards positive values, with the mean score being 0.075.We next inspected averaged head and tail CPS profiles of all the CDSs in a given genome. Remarkably, in nearly all species (441 out of 478) examined the CPS values are relatively low near the 5’end of mRNA and increase rapidly as the distance from the start codon grows. We call this effect a‘codon pair ramp’.In order to determine the typical length of the codon pair ramp we smoothed the averaged CPS profiles by calculating a mean value of the CPS profile within a sliding window of 10 codon pairs in length. The length of the ramp was then defined as the region in which the mean CPS value is significantly lower (Kolmogorov-Smirnov Test, P-value<0.05) than the mean of all 12 sliding windows. We found that the length of the codon pair ramp is about 20 to 50 codon pairs in almost all species examined. In the human genome the length of the codon pair ramp is 40 codon pairs, and the average CPS value in this region is 0.067, ~7% lower than the mean value of the first 120 codon pairs which is 0.072. By contrast, the average CPS value in the region between the 50th and the 120th codon pair is 0.076, ~6% higher than the mean value of first 120 codon pairs.We calculated the CPB value of the first 40 codon pairs for each individual CDS in a given genome. While the average CPB value for all CDS in human is 0.075, the mean value for the first 40 codon pairs of all CDS is 0.066, and the CPB value of the first 40 codon pairs in each CDS is significantly lower than the CPB value of the entire sequence (Paired t-test, p-value < 2.2e-16).We also found lower CPS values in the tail parts (the last 120 codon pairs) of coding sequences in 413 out of 478 species studied, such as human, mouse, rat, cow, D.melanogaster, C.elegans and E.coli, while in 69 out of 478 species studied, such as S.cerevisiae, S.pombe and A.fumigatus, no such tail ramp appears to exist. Out of 413 species possessing both head and tail codon pair ramp in 375 species the length of the head ramp is longer or equal than the length of the tail ramp.2. Comparing CPB between wild profiles and random profilesTo verify that the observed codon pair profile is not a trivial consequence of lining up all CDSs in a given genome by their start/stop codons, we developed a computer program by using R programming language with Seqinr package (http://seqinr.r-forge.r-project.org/) to generate random sequences for each CDS in E. coli, human and S.cerevisiae. Using this R program, we randomly shuffled each CDS in a given genome. The shuffling was done using two alternative methods: a) random permutation of codons occurring in a CDS while preserving the exact count of each codon (codon randomization), and b) random selection of synonymous codons for each amino acid while preserving the amino acid sequence and codon usage of a given CDS (synonymous codon randomization). Both procedures were repeated 50 times, and the averaged CPS profiles of random sequences of a given species were produced by using CPS value of each codon pair from wild genome.Average CPS values in these two profiles are negative which means that codon pairs in random sequences are statistically under-represented compared with wild sequences. Such negative values are expected because the codon pair usage of wild sequences is not random and not all combinations of two codons in wild sequences are used as frequently as in random sequences. Moreover, while codon pair ramps near the 5’end of mRNA exist in all coding sequences in a given genome, randomized sequences do not show this effect, indicating that the observed profiles are not a trivial consequence of lining up all CDSs in a given genome by their start/stop codons.3. Analysis the correlation between CPB ramp and translation speedThe aim of this research is to analyze the correlation between codon pair usage and translation speed, especially for the CPB ramp region.Based on several self-made Java and Python computer programs, we compared the tRNA adaptation index (tAI) to CPB in each CDS in human, mouse, rat, cow, D.melanogaster, C.elegans, S.cerevisiae, S.pombe and E. coli. The tAI value of a given transcript reflects its adaptation to the tRNAs pool in a given genome. tAI is a number between 0 and 1, with higher values corresponding to higher translation speed. A significant positive (albeit weak) correlation between these two values was indeed found in human (Spearman’sρ=0.298, P<2.2e-16) and other species, which confirms that one possible force shaping codon pair bias is optimization of translational speed by means of the adaptation to the tRNA pool.We also calculated an averaged tAI profile for each codon pair position in a given genome. In this case, tAI values were calculated for each codon pair by taking the geometric mean of the tAI values of the two codons comprising a given codon pair. For all CDSs in a given genome, we compared the average CPS value for each codon pair position along coding sequences to the average tAI value for this codon pair position. We observed a strong positive correlation between average CPS values and average tAI values for each codon pair position in the codon pair ramp regions of human, cow, D.melanogaster, C.elegans, S.pombe and E. coli. For example, in human the CPS profile has a strong and significant (Spearman’sρ=0.651, P<9.177E-06) correlation with the translation speed profile among the first 40 codon pairs. However, no significant correlation was found between CPS and tAI values for the 40th to 120th codon pairs (Spearman’sρ=-0.032, P=0.776) in human. In mouse, rat and S.cerevisiae we did not find any correlation between CPB and tAI the in ramp region. However, in S.cerevisiae, when considering the first 120 codon pairs we found a week but significant positive CPB/tAI correlation (Spearman’sρ=0.242, P=0.0078).Tight connection between codon pair bias and translation speed in the codon pair ramp region suggests that under-represented codon pairs slow down early elongation steps and thereby reduce the rate of translation in the vicinity of the translation initiation region. These findings are also consistent with the notion that it is translation initiation or early elongation, and not global elongation rate that is rate-limiting for gene expression. Interestingly, however, in mouse and rat we did not find any correlation between CPS and tAI. We speculate that in these organisms selection to promote mRNA stability, rather than translational selection, may affect the codon pair preference as well as codon usage.4. The effect of codon pair usage on the translation of GFP genesIn this study, we used the sequences of 154 green fluorescent protein (GFP) genes to test the effect of codon pair usage on translation. Sequences of 154 genes that varied randomly in their codon usage, but encoded the same GFP, as well as normalized fluorescence levels for pGK8 (T7 promoter, no leader sequence), reflecting their expression levels in E.coli, were obtained from Kudla et al’s work.The average CPB value of these genes is -0.098, lower than in E.coli’s endogenous genes (0.077). As expected we neither found the codon pair ramp in these data, nor was the CPB value of each complete gene sequence significantly correlated with fluorescence levels (Spearman’sρ=-0.106, P>0.19). However, while considering only the first 40 codon pairs of each sequences (average CPS=-0.112) we found a significant and negative correlation (Spearman’sρ=-0.256, P<0.01) between CPB and fluorescence levels. Furthermore, in the 25% of GFP constructs with the highest CPB values in the first 40 codon pairs (37 constructs) fluorescence levels significantly and strongly (Spearman’sρ=0.514, P<0.01) correlate with codon pair bias of the first 40 codon pairs. These results fully suggest that instead of the global codon pair usage there is a relationship between the local codon pair usage and the expression level for each gene.In summary, in this study, several computer programs were developed by using Java, Python and R programming languages, and a broad survey of codon pair bias through codon-pair-by-codon-pair near the translation-initiation region of all protein-coding sequences in 478 organisms from all three domains of life has been completed. We found that in nearly all species there is a general tendency for increased CPB near the 5’end of protein coding sequences, to which we refer as“codon pair ramp”in this study. Such ramp is constituted by the first 20 to 50 codon pair positions of the protein-coding sequence where codon pair bias is relatively low. Our finding of strong interconnection between codon pair bias and translation speed confirms the important role played by the nucleotide sequence near the 5’end of mRNAs in controlling early elongation. All the source codes of computer programs developed and used in this study are free available. Statistical evidence presented in this work remains to be experimentally verified and explained furthering our knowledge of how information stored in DNA sequences determines diverse cellular processes.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络