节点文献

小鼠及人顺式作用元件CArG序列特征及进化研究

Sequence Character and Evolution Analysis of Cargome in Mouse and Human Genome

【作者】 沈霞

【导师】 陶士珩;

【作者基本信息】 西北农林科技大学 , 生化及分子生物学, 2009, 博士

【摘要】 随着人类和其它模式生物的基因组完成测序,越来越多的研究表明非编码DNA是真核生物基因组的主要组成部分。在生物体生长发育过程中,转录因子与短序列DNA顺式作用元件的正确识别与结合是调控基因特异性时空表达的关键步骤。血清反应因子是众多得到深入研究的转录因子之一,在有机体内广泛表达,主要调控与生物体发育过程相关的基因。因其调控功能多样性和重要性成为生物学中研究的热点之一。血清反应因子识别结合的顺式作用元件CArG近年来随之备受关注,其重要性日益凸显。尽管有关CArG元件的报道越来越多,但是有关其序列特征及进化起源的研究却鲜见报道。到目前为止,在哺乳动物中共有390条CArG元件经过实验验证是有功能的。首先,分析了功能CArG元件的在转录起始位点附近的分布及其与功能的关系,同时研究了相邻CArG元件之间的距离与功能的关系。第二,通过对功能CArG元件序列与背景序列的对比分析,对功能序列的序列特征进行深度挖掘。第三,通过自己编写的计算机程序分别统计了小鼠和人中功能CArG元件的单倍型的种类和数量,并确定了其优势单倍型。最后,通过同源比对分析确定了小鼠和人的同源功能CArG元件,并在此基础上推测了功能CArG元件的进化起源。研究结果显示,71%的功能CArG元件分布在转录起始位点上游,且距离转录起始位点越近,CArG元件的数量越多。且上游的功能CArG元件的位点分布服从负冥指数分布。此外,通过统计对比发现含有功能CArG元件的基因上游常含有比背景序列数量更多的CArG-like序列。而当二者含有的CArG-like序列的平均数相差无几的时候,CArG-like序列间的平均距离在这两组基因中的差异并不显著。通过对比分析小鼠和人中功能CArG元件与背景序列的替换率发现,功能CArG元件替换率要低于背景的替换率,这一研究结果说明功能元件的进化因受到功能的限制而很少发生替换。但是功能CArG元件的中心位点5和位点6却显示出比背景序列高的替换率,这一结果预示着这两个位点可能承受着正向选择压力。同时,通过与背景序列的对比分析发现,中心序列中的A\T排列呈现明显的TATA偏好。这一特征可能对功能CArG元件与血清反应因子识别时形成正确的构象有重要作用。研究发现小鼠和人中功能CArG元件存在大量的单倍型。而小鼠基因组中的优势单倍型与人类中的优势单倍型不尽相同。尽管如此,二者仍存在相似序列特征,即都是没有替换的完美CArG元件且其中心序列呈现出明显的TATA排列。启动子序列的同源对比分析在小鼠和人中发现了22对同源CArG元件,结果显示功能重要的CArG元件在亲缘关系较远的物种间也高度保守。本研究首次用生物信息结合比较基因组的方法分析了哺乳动物中功能CArG元件的序列特征,并探究了其进化特性。研究结果揭示功能CArG元件的序列特征,并完善和补充了前人的研究结果。此项研究揭示了功能CArG元件特有的序列特征,不仅仅为设计预测软件,降低预测假阳性,提高预测成功率,准确预测候选CArG元件提供了重要的信息;也将推动利用比较基因组的方法对其它生物体中功能CArG元件的研究工作。同时,这些序列特征为进一步研究功能CArG元件对其下游基因表达调控的影响提供了可靠的信息。更为重要的是,本研究为揭示其它顺式作用元件的遗传和序列特征提供了研究平台。对其它顺式作用元件的序列特征的深入发掘,为将来准确预测顺式作用元件、分析基因表达以及研究基因表达调控网络模型奠定坚实基础。

【Abstract】 After the whole-genome sequencing in humans and model organisms is that non-coding DNA makes up the majority of the most eukaryotic genomes. Transcription factors recognize degenerate families of short sequences to regulate the expression of genes during their development and throughout life. The serum response factor is one of the well studied regulatory factors. It is expressed throughout the body and throughout development. It mainly regulates the genes that effect the development of the living things. And the serum response factor is one of the valuable transcription factor because of its important role in gene expression. While many studies of cis-elements CArG bound by serum response factor are in progress, little is known about the sequence character and evolution original of the functional CArG elements.To present, there are total of 390 CArG elements experimentally validated in the mammalian. Firstly, we used a validated CArG dataset to calculate the distance distribution of functional CArG elements around the TSS. Distances between adjacent CArGs were also analyzed. We compared these distributions to those derived using a control set of randomly selected CArGs (that were not experimentally validated for function). Secondly, the analysis of the CArG sequence compared to the background sequence was performed to discover the sequence character of functional CArG elements. Thirdly, to find the dominant haplotypes in the mammalian, the computer program was run to scanning all the functional CArG elements in the mouse and human genome to find all the haplotypes. Finally, In order to find the evolution origin of the CArG elements, the functional CArG elements were compared between the mouse and human through the orthologous analysis.Our results show that 71% functional CArG elements exist upstream to the annotated TSS, with copy number increasing as one move closer to the TSS. And the distribution of the functional CArG elements upstream to the annotated TSS followed the negative power function. Moreover, the average number of the CArG-like elements in the CArG-containing genes is significantly more than that in the control genes. However, when the copy number of the two sets is almost even, the distance between adjacent elemens showed no bias between the functional ones and control ones. Through the analysis of the CArG sequence compared to the background sequence, we discovered that the substitution rate within the functional CArG elements was slower than that of the background DNA in both genomes. The results showed that the functional CArG evolved more slowly than that of the the background DNA because of function. However, core sites of the functional CArG elements evolved faster than that of background DNA. This may hint that these two sites under positive selection. And the core region of the functional CArG showed an obvious TATA bias sequence. And this sequence character of functional CArG may contribute to the formation of SRF binding with the CArG elements. The haplotype analysis of these data showed that the sequence of CArG elements is significantly diverse within species in both genomes. Moreover, the dominant haplotype of the CArG elements is not totally the same in the two genomes. To verify this finding, orthologous studies in the promoter region of CArG-containing genes was performed resulting in that most functional important to be perfectly conserved between the two genomes. And through this analysis 22 orthologous pairs of human and mouse were found. This result provides that functional important CArG elements are conserved between relatively distant speciece.We have performed the first genome-scale analysis of CArGome in mammalian through genetic and evolutionary way. The studies provided here revealed the sequence character of CArG elements and extended earlier bioinformatic analyses of functional CArG elements. In this study, we do reveal important pattern of sequence characteristics within functional CArG elements, and it will be a great help to take these into account in predicting the candidate CArG elements and attempting to distinguish the functional CArG elements through alignments. The results presented here provide a platform to study the cis-element through a genetic and evolutionary ways in the mammalian. Better understanding of the sequence characteristic of different classes of cis-elements will fundamentally affect all future developments in cis-element prediction, analysis of gene expression and regulatory determinant pattern detection.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络