节点文献

基于基因家族序列分析研究基因的倍增、分化和多效性

Study of the Duplication、Divergence and Pleiotropy of Genes by Analysis of Gene Family Sequences

【作者】 曾燕舞

【导师】 谷迅;

【作者基本信息】 复旦大学 , 生物化学与分子生物学, 2010, 博士

【摘要】 基因家族是指具有共同祖先、序列相似的一组基因。基因家族序列分析已经广泛渗透到现代生物学相关领域的各个方面,成为了生物学研究的一种常规手段。它广泛应用于研究物种的起源、分化、进化机制及检测自然选择压力等。现在还发现基因家族的序列信息对系统生物学的研究有着潜在的应用价值。本文试图从三个方面挖掘基因家族所包含的信息:基因的倍增时间、倍增基因的功能分化和基因多效性。虽然本文仅利用了脊椎动物、果蝇、酵母等物种的序列信息,但本文所使用的方法还可以扩展到其物种的相关研究。本文的主要研究内容和结论现归纳如下:1.在人类基因组中,基因倍增的时间分布呈现出两个波峰和一个古老基因集。由于功能的限制,不同的功能分类也会表现出基因倍增后保留的差异性,并且各个功能分类之间还会相互影响,从而保持相似的基因保留模式。现在已经知道某些功能的基因,在脊椎动物早期大量倍增,但各个功能分类间的基因倍增时间相关性还没有相关的研究。为此,我们开发了一套可靠的用来估计基因倍增时间的流程,并用之估计了人类中大部分基因的倍增时间,分析了不同功能分类间以及不同表达位置间的相关性。以G0分类来看,功能相关的基因分类聚在了一起。所有G0分类中,形成了两个与全基因组明显差异的功能组。一个是发育相关的功能组。另一个是生物生理过程相关的功能组。我们利用了更严谨的估计基因倍增时间的方法研究了三个信号转导相关的基因超家族:转录因子、蛋白激酶和G蛋白受体。它们的倍增时间分布模式与G0分类中的信号转导分类是相似的。此外,我们还比较了在不同细胞位置表达的倍增基因的时间分布模式。从细胞核到细胞外,基因倍增在时间分布模式上与全基因组是相似的。细胞外表达的基因在近600百万年以来,倍增速度有一定幅度的加快。2.基因倍增及其后的功能分化被认为是基因组功能多样性的来源。虽然已有几个模型用来描述基因倍增后的功能分化模式,但还是很有必要建立合适的倍增基因间的功能距离测度。我们提出了一种量度两个直系同源基因簇间功能距离的简单方法。我们对经过两次基因倍增后产生的具有三个直系同源基因簇的脊椎动物基因家族进行了一种新的统计检验,发现了基因倍增后功能分化的两种模式。这两种分化模式显示出基因分化对两次基因倍增具有不同的作用。功能距离分析可以为基因分化后不同直系同源基因簇间的功能分化水平提供简单的测度并将有助于理解功能基因组中的功能创新机制。3.基因多效性是指一个基因能够同时影响多个表型,这是基因的一个很普通的属性。生物学家很久以来就意识到了基因多效性的重要性。然而,对基因多效性的范围还没有经过严格的探究。理论上,Fisher的模型假设了一个广泛多效性模型,也就是说,一个基因的突变可以潜在地影响所有的性状。另外一方面,实验的结果表明基因通常仅能影响几个不同的表型。我们估计了321个脊椎动物基因的多效性,发现一个基因通常只能影响6-7个的分子表型(对应于生物适应度的维度)。另外,我们发现估计出来的基因多效性与G0生物过程数目和表达宽度是正相关的。这说明了这种估计基因多效性的测度具有确定的生物学意义。此外,根据我们的结果,基因多效性具有一个确定的数值,所以在理论研究时,假定基因突变具有广泛多效性,是值得商榷的。4.介绍了一个计算基因多效性的可视化软件Genepleio,并对计算误差进行了评估和就如何改善计算误差提出了建议。为了扩大基因多效性测度的应用范围,本文还提供了一个位点多效性的计算方法。利用Genepleio,我们研究了三类物种(脊椎动物、果蝇、酵母)中基因多效性分布的情况,发现脊椎动物和果蝇具有相似的基因多效性分布,并且它们的基因多效性平均值低于酵母。进一步关于人类疾病相关的多效性分析,发现大多数疾病基因的多效性仅仅略低于平均多效性水平。

【Abstract】 A gene family denotes a subset of genes that has the common ancestor and sequence similarity. Now, sequence analysis of gene family has permeated into every aspects of biology-related area and become a regular tool. It is widely used for the investigation of the species origination, differentiation, evolutionary mechanism and detection of natural selection.In the dissertation, we tried to dig the information behind gene families from three aspects:duplication time, functional divergence and gene pleiotropy. Although we only use sequences from vertebrates, flies and yeasts, the method we used can be extended to similar research on other species. Our major research content and conclusion are as the following:1. In human genome, the age distribution of gene duplication was found to present two-wave duplication and an ancient component. By functional constraint, gene duplication in different functional categories will probably show different age distribution. However, functional categories are correlated and will affect each other, which result in similar gene retention pattern. There is finding that some functional categories have bias in the retention in different stages of evolution, but the correlation between different categories is not investigated. Thus, we developed a pipeline to estimate the age of duplication events with which we estimated most of the age of duplication events in human and zebra fish. We analyzed the retention pattern of duplicate genes in different GO functional categories and found two distinct patterns. One cluster is correlated with development and signaling. The other cluster is correlated with organism physiology process. For detailed information of the first cluster, we used a stricter method to estimate the duplication age of genes from three signaling-related super gene families. Their age distribution pattern is similar to the GO "signal transduction" categories. Besides, we compared the difference of age distribution of genes in different subcellular localizations. We found, from the nucleus to extracellular space, the age distribution patterns are almost similar. Besides, in the recent 600 million years, the gene duplication has accelerated a bit. Summarily, gene duplication is consistent in function-related categories as well in different subcellular localizations.2. The gene duplication and following functional differentiation are considered the source of function diversification of genomes. Although several models have been proposed to describe the patterns of functional divergence after gene duplication, an appropriate measure of functional distance between different duplicates is highly needed. In this paper, we proposed a simple method to measure the functional distance between each two subfamilies. We have performed a new statistical test on ten 3-cluster vertebrate gene families which have been generated after two rounds of whole genome duplications, and found two patterns of functional divergence after gene duplication(s), indicating two rounds of gene duplications may have distinct roles in the functional diversification. Functional distance analysis may provide a simple measure for the level of functional divergence between gene clusters after gene duplication(s) and further shed light on the mechanism of functional innovations in functional genomics.3. Biologists have long recognized the importance of gene pleiotropy, that is, single genes affect multiple traits, which is one of the most commonly observed attributes of genes. Yet the extent of gene pleiotropy has been seriously under-explored. Theoretically, Fisher’s model assumed a universal pleiotropy, that is, a mutation can potentially affect all phenotypic traits. On the other hand, experimental assays of a gene usually showed a few distinct phenotypes. We estimated the effective gene pleiotropy for 321 vertebrate genes, and found that a gene typically affects 6-7 molecular phenotypes that correspond to the components of organism fitness, respectively. The positive correlation of gene pleiotropy with the number of gene ontology biological processes, as well as the expression broadness provides a biological basis for the sequence-based estimation of gene pleiotropy. On the other hand, the degree of gene pleiotropy has been restricted to a digital number of molecular phenotypes, indicating that some cautions are needed for theoretical analysis of gene pleiotropy based on the assumption of universal pleiotropy.4. We introduce a software, Genepleio, to calculate gene pleiotropy, and calculate the estimation error and give the suggestion to improve the estimation. For wider appliance of gene pleiotropic measure, we also designed a method to estimate the site-specific gene pleiotropy. Using Genepleio, we studied the gene pleiotropy distribution in vertebrates, flies and yeasts. We found vertebrates and flies have similar gene pleiotropy distribution, with average of gene pleiotropy below yeasts. Moreover, we calculate the gene pleiotropy of disease-related genes. Their pleiotropy is only a little below other genes.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2010年 11期
  • 【分类号】Q75
  • 【下载频次】438
节点文献中: 

本文链接的文献网络图示:

本文的引文网络