节点文献

几种哺乳动物表达序列标签(EST)生物信息学分析

Bioinformatics Analysis of Expressed Sequence Tags (EST) in Several Mammals

【作者】 苏志熙

【导师】 于军; 谷迅; 胡松年;

【作者基本信息】 浙江大学 , 生物学, 2006, 博士

【摘要】 自1991年Adams等测定609条表达序列标签(expressed sequence tags,EST)后,大规模EST测序的概念得到了普遍认可,大规模EST测序技术得到了越来越广泛的应用。到目前为止,海量的EST数据已经被产生出并储存到dbEST,Unigene等公共数据库中。EST,来源于cDNA的部分测序,已被广泛的用于新基因发现、基因图谱绘制、多态性分析,表达研究及基因预测等基因组学和分子生物学的各个领域。本文从EST序列预处理、拼接、聚类、注释和功能分类出发,着重介绍了EST数据在哺乳动物比较进化基因组学方面的几点应用。第二章主要介绍了家猪乳腺大规模EST测序以及后续的生物信息学分析。我们共构建了不同猪种、不同发育时期的非标准化家猪乳腺cDNA文库,并从中获得28941条高质量EST。利用序列拼接软件TGICL,这些EST序列被拼接成2212条重叠群和5642条单一序列。这些序列在经过功能注释后被聚类成6857个基因聚类,其中2072条无相应功能注释的序列被认为来自于新基因。按照标准基因词汇体系(Gene Ontology)的分类标准,已经有功能注释的基因进行了聚类分析。通过比较基因表达谱,确定了几组在猪种间和猪种内不同发育时期乳腺中差异表达的基因,这些基因中某些可能与家猪的繁殖特性相关,另外则在乳汁合成、分泌和乳腺复旧过程中发挥重要的作用。这些来源于家猪乳腺特定发育时期的基因表达谱和一些功能未知的EST序列为进一步的研究提供了重要资源。第三章利用EST序列预测了基因的选择性剪接(AS)形式,并分析了基因复制后AS的进化。AS和基因复制是蛋白质组功能多样性的两大主要来源,本文通过分析复制基因间AS的差异,研究了基因复制后AS的进化趋势。我们发现多拷贝基因比单拷贝基因具有较少的AS形式,而且基因AS形式的数量与基因家族的大小存在负相关性。值得注意的是,我们发现复制基因AS形式丢失的过程可能发生在基因复制后很短的时间内。这些结果支持了在基因复制后早期,AS的亚功能化模型(subfunctionization model)。对人复制基因对间AS分布的进一步分析表明,基因复制后AS进化是非对称性的,如,复制基因间的AS形式数可能有很大的变化。因此我们推断,AS和基因复制可能不是独立进化的。在基因复制后的早期,新的复制基因可能在一定程度上代替AS机制带来基因功能的分化,在基因复制后的后期,复制基因间获得或丢失AS形式似乎是独立的。第四章主要阐述了利用Unigene数据库中基于EST的基因表达谱数据对基因表达调控进化的分析。基因表达调控进化的研究在生物进化以及人类疾病易感性等研究领域都有着很重要的意义。研究基因表达调控进化的一个重要方面就是研究基因表达量或基因表达模式的变化。大量基于基因芯片的不同物种或不同组织基因表达数据的产生为该领域的研究提供了很大便利。然而,由于基因芯片数据固有的缺陷,许多对基因表达进化的研究报导了相互矛盾的结果。在本文中,我们利用Unigene数据库中的表达谱信息,对基因表达进化进行了详细分析,验证了一部分前人研究存在的争论,并提出了一些新的观点。我们开发了一种新的基于EST数量的表达谱分化距离的度量(Ug),分析了人复制基因间及人与小鼠同源基因、同源组织间的表达进化。我们发现,基因表达的进化和基因编码序列的进化是一致的,都可能受到了负选择的作用。有趣的是,基因的组织特异性和基因的表达量与基因表达的进化则可能没有相关性。利用基于EST的基因表达谱数据,我们还构建了人和小鼠15个组织表达谱距离系统树图,我们认为在对物种间不同组织根据表达谱构建和分析系统树图时应考虑到组织在物种间的表达谱进化距离(De)以及由发育过程带来的不同组织间的表达谱距离(Dd)这两种不同因素的影响。

【Abstract】 Since the original description of 609 Expressed Sequence Tags (ESTs) by Adams et al. in 1991, the concept of large-scale EST sequencing has been more and more universally recognized, and the technology of large-scale ESTs sequencing has been more and more widely used. Up to now, huge ESTs data have been produced and submitted to the public database such as dbEST and Unigene. ESTs, arising from partially sequencing of cDNAs, are now widely used throughout the genomics and molecular biology communities for gene discovery, mapping, polymorphism analysis, expression studies, and gene prediction. In this study, starting with the EST sequences pre-processing, assembling, clustering, annotation and functional classification, we focused on a few applications of ESTs in the vertebrate comparative evolutionary genomics.Chapter II summarizes an experiment of large-scale EST sequencing of porcine mammary gland and a comprehensive bioinformatics analysis. A total of 28,941 ESTs were se.quenced from five 5’-directed non-normalized cDNA ?libraries, which were assembled into 2212 contigs and 5642 singlets using CAP3. These sequences were annotated and clustered into 6857 unique genes, 2072 of which have no functional annotations were considered as novel genes. These genes were further classified into Gene Ontology categories. By comparing the expression profiles, we identified some breed- and developmental-stage-specific gene groups. These genes may relative to reproductive performance or play important roles in milk synthesis, secretion and mammary involution. The unknown EST sequences and expression profiles at different developmental stages and breeds are very important resources for further research.In chapter III, we predicted the alternative splicing (AS) forms using the EST sequences, and then extend the study to the evolution of AS after gene duplication. We observed that duplicate genes have fewer AS forms than that of single-copy genes, and that a negative correlation exists between the mean number of AS forms and the gene family size. Interestingly, we found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. These results support the subfunctionization model of alternative splicing in the early stage after gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications, i.e., the AS forms between duplicates may differ dramatically. We therefore conclude that alternative splicing and gene duplication may not evolve independently. In the early stage after gene duplication, young duplicates may take over a certain amount of protein function diversity that previously was carried out by the alternative splicing mechanism. In the late stage, the gain and loss of alternative splicing seem to be independent between duplicates.Chapter IV introduces the study of the gene regulatory evolution using the expression profile data based on EST counts in Unigene database. The study of gene regulation evolution is not only of interest in an evolutionary context but also promises to shed light on the contribution of regulatory region variation to human disease. One major approach of gene regulation evolution is to start at the phenotypic level and analyze variation in pattern of gene expression. The availability of huge transcriptome data produced from various tissues in various organisms, which are based on oligonucleotide microarray technology, makes it possible to study the gene regulation expression. However, since the microarray data have its intrinsic defects, many previous studies in this field have obtained the contradictive conclusions. In this study, using the expression profile data extracted from Unigene database, we analyzed the gene expression evolution comprehensively. We verified parts of previous controversial results and also found some new observations. We developed a new measure of gene expression profile divergence (Ug). Based on Ug, we analyzed the expression evolution between human duplicate genes, human-mouse orthologs or between orthologous tissues. We found that the evolution of gene expression and gene sequence are coupled; both of them may evolve under natural selection. But there are no correlations between gene expression and gene expression specificity or gene expression level. We further constructed the tissue expression dendrograms of 15 human and 15 mouse tissues. We then suggested that we should consider two different factors, the evolution distance caused by speciation (De) and the evolution distance caused by tissue development (Dd), when we constructing and analyzing the tissue expression dentrograms.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2007年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络