节点文献

基于生物信息学方法的水稻TLP、SBP-box、CPP-like、Cystatin和HAK基因家族的分子进化研究

Bioinformatics-Based Molecular Evolutionary Analysis of TLP, SBP-box, CPP-like, Cystatin and HAK Gene Families in Rice

【作者】 杨泽峰

【导师】 徐辰武;

【作者基本信息】 扬州大学 , 作物遗传育种, 2008, 博士

【摘要】 基因家族起源、分化的分子机制研究是生物进化领域的重要研究内容。随着各种模式生物基因组测序工作的完成,人们对基因家族研究兴趣越来越浓。植物基因组中基因家族的成员数目往往比动物等其他物种中更多,这主要是由于其在植物物种中特异性扩张造成的,物种特异性扩张主要有3种方式,即片段复制、串联重复和逆转录转座等。其中片段重复和串联重复为主要方式。重复的基因可能经历正选择和基因转换等适应性进化的选择压力。目前对重复基因的命运的讨论已比较多,主要模型有亚功能化、非功能化、新功能化和亚新功能化等模型。水稻基因组在进化过程中至少经历了两次全基因组倍增事件,从而产生了大量的重复基因。已有生物信息学分析表明水稻籼粳两个亚种的分化时间可能在距今50万年前。此外,在水稻基因组被测序之后,多个研究机构对水稻中基因家族的分子进化规律进行了分析,使得基因家族的分子进化研究成为水稻基因组研究领域的热点之一。本文在这些研究的基础上,从基因组水平上对水稻TLP、SBP-box、CPP-like、Cystatin和HAK 5个基因家族进行了生物信息学分析和分子进化研究。主要结果如下:(1)类Tubby蛋白质(Tubby-like proteins,TLP)在动物中属于一个较小的蛋白质家族,但是在动物生长发育中起着重要作用,主要表现在对神经细胞的维持和功能等方面。该基因家族的成员广泛存在于动植物等多细胞生物中。本研究中,对拟南芥、水稻和杨树的TLP基因家族进行了系统发生分析,并进行了分子进化研究。通过在基因组水平上的搜索,在拟南芥、水稻和杨树基因组中分别发现了11、14和11个TLP基因。植物中的大部分TLP蛋白质包含两个高度保守的结构域,分别是TUB和F-box结构域。通过系统发生分析,可以将植物中的该基因家族成员分成3个亚族,并且发现该基因家族成员数目在单子叶和双子叶植物分离之后按照物种特异性的方式进行了扩张。该基因家族的内含子位置在大多数基因中是保守的,表明这一基因家族的外显子/内含子结构在单子叶和双子叶植物分离之前便已经形成了。在基因组水平上对这3个物种的基因扩张模式进行了分析,发现水稻和杨树TLP基因家族的扩张主要是由于片段重复引起的,而非串联重复和随机性复制与重复事件。协同进化分析表明TUB和F-box结构域在进化过程中是协同进化的,并且可能一起行使着一定的功能。对3个物种中TLP的旁系同源基因进行的组织特异性表达分析表明在长期的进化过程中,重复基因发生功能分歧是其主要特征。对重复基因的Ka /Ks分析表明正选择和自然选择可能对重复基因的功能性分歧起着重要作用。(2)SBP-box基因家族是植物特有的转录因子基因家族。这类基因编码的蛋白质序列中包含一个高度保守的SBP结构域,并能够特异结合在植物花分生组织识别基因SQUAMOSA及其同源基因的启动子区段,以调控其表达。本研究中,利用已知SBP-box基因所编码蛋白质的SBP结构域为检索序列,在拟南芥和水稻基因组中分别鉴定出17和19个SBP-box基因。系统发生分析表明这类基因的基本特点在拟南芥和水稻分离之前便已经建立,并且在单子叶和双子叶植物分离之后,在拟南芥和水稻中均按照物种特异性的方式进行了扩张,片段复制对这两个物种中SBP-box基因家族的扩张均起到重要的作用。根据系统发生树将SBP-box基因家族分成了9个同源簇,每一同源簇中均具有相似的基序,并且基序在蛋白质中的排列顺序也非常相似。非同义替换率与同义替换率分析表明同源基因的SBP结构域区段经历了纯化选择,而结构域之外的区段则经历了正选择或者较弱的纯化选择。通过对EST数据库的搜索,进一步分析了SBP-box基因家族的表达模式,结果表明拟南芥的SBP-box基因主要在花、叶、根和种子中表达,而水稻的SBP-box基因则主要在花和愈伤组织中表达。(3)类CPP基因家族(CPP-like gene family)属于一类成员数目较少的基因家族,该基因家族成员编码的蛋白质序列含有一到两个富含半胱氨酸的结构域,即CXC结构域。该基因家族在植物和动物中广泛存在,但是没有在酵母中发现。为了解CPP-like基因家族在植物中的进化规律,本研究对拟南芥和水稻基因组中的CPP-like基因家族进行了比较分析和分子进化研究。系统进化分析表明在单子叶和双子叶植物分化完成之后,拟南芥和水稻基因组中的CPP-like基因家族按照物种特异性的方式进行了扩张,并且在拟南芥中还发现了基因丢失现象。对重复基因的内含子/外显子结构分析表明内含子获得和丢失均对这些基因的进化起到重要的作用。进一步分析表明正选择压力是植物中CPP-like基因进化的主要动力,并且发现具有正选择作用的氨基酸位点大部分位于结构域之外的序列中。此外本研究还发现CPP-like基因编码的蛋白质中存在两段CXC结构域,这两段CXC结构域序列及连接这两段结构域的序列在长期的进化过程中是协同进化的。(4)半胱氨酸蛋白酶抑制剂(cystatin)基因广泛存在于植物物种中。该类基因能够起到抑制某些病原微生物和昆虫体内半胱氨酸蛋白酶的作用,在植物的防卫体系中起到重要作用。我们利用已经分离出的植物cystatin基因的cystatin结构域为检索序列在基因组水平上对拟南芥和水稻中的cystatin基因家族的成员进行了鉴定和系统发生分析。结合结构域的鉴定、多序列联配以及MEME分析最终确定了7个拟南芥和12个水稻的cystatin基因。系统发生分析表明cystatin基因的基本特征很可能是在拟南芥和水稻的分离之前就已经形成了。cystatin结构域在蛋白质间是高度保守的。拟南芥与水稻Cystatin基因家族的两个亚族之间发生了功能性分歧,并且适应性进化在其进化过程中起着重要作用。拟南芥和水稻的cystatin基因主要在花、叶、根、种子和愈伤组织中表达,这对植物避免昆虫的侵害将起到重要作用。(5)高亲和力钾离子(high-affinity K+)转运体基因家族是植物中最大的钾离子转运基因家族,在植物的生长发育中起着重要作用。本研究中通过全基因组搜索,在水稻基因组中发现27个编码高亲和力钾离子转运体的基因。通过系统发生树将拟南芥与水稻的HAK转运子基因家族分成4个相互独立的亚族。在单子叶和双子叶植物分离之后,水稻中的HAK基因按照种系特异性方式进行了扩张。对该基因家族的功能性分歧分析表明该家族发生了功能性分歧现象。对该基因家族的适应性进化分析表明具有正选择效应的点突变和基因转换事件在该基因家族的进化过程中起到重要的作用。以上关于水稻5个基因家族的生物信息学分析和分子进化研究结果将为这些基因的功能鉴定奠定重要基础。

【Abstract】 The molecular mechanism for genesis, differentiation of gene families was the important content in the field of molecular evolution. With the completeness of sequencing projects for some model organisms, more attention was put on the evolutionary analysis of gene families. There usually are more members for a gene family in plants than in animals and other organisms. This was the result of species-specific expansion of gene families in plants. Gene duplication may arise through three principal mechanisms: segmental duplication, tandem duplication and transposition events such as retroposition and replicative transposition. The main mechanism of gene duplication was segmental duplication and tandem duplication. The genes in duplicate pairs may go through positive selection and gene conversion after the duplication. Some models were developed to explain the fate of duplicated genes. Among them, nonfunctionalization, neofunctionalization, subfunctionalization and subneofuncitonalizaiton were widely accepted. In previous studies, several rounds of whole genome duplication in rice genome were reported. And this resulted in many duplicated genes. With bioinformatic analysis, the indica-japonica divergence was found occuring at the last 0.5 million years ago. Many researches consisted with the evolution of rice gene families after the sequencing of rice genomes, and the evolutionary anlyses of gene families in rice had become one of the most important subjects in the field of rice genetics. In present study, we performed the bioinformatic and evolutionary analyses of TLP, SBP-box, CPP-like, Cystatin and HAK gene families in rice. The main results showed as follows:(1) Tubby-like proteins (TLP) played an important role in maintenance and functionization of neuronal cells during postdifferentiation and development in mammals, and had been found in multicellular organisms from both the plant and animal kingdoms. We presented here a comparative phylogentic and molecular evolutionary analysis of the TLP gene family in Arabidopsis, rice and poplar. At the level of genome-wide screening, we identified 11 TLP genes in Arabidopsis, 14 in rice and 11 in poplar. Most Tubby-like proteins in plants contained both highly conserved TUB and F-box domains. Alignment of predicted protein sequences showed that there were four conserved blocks in TUB domain. Phylogenetic analysis grouped this family into three subfamilies and suggested that species-specific expansion contributed to the evolution of this family in plants. The intron distribution of this family was conserved in most members of this family, suggesting that the exon/intron structure of this family had been existent before the split of monocots and dicots. On a genome scale we revealed that the rice and poplar TLP family should have expanded mainly through segmental duplication events, rather than through tandem duplication and replicative transposition events. Co-evolutionary analysis revealed that TUB and F-box domains had possibly co-evolved during the evolution of proteins that possessed both domains. The tissue-specific expression analysis illustrated that functional diversification of the duplicated TLP genes was a major feature of the long-term evolution. Furthermore, analysis of nonsynonymous and synonymous substitution rates indicated positive and neutral selections contributed to the functional diversification of duplicated pairs.(2) SBP-box proteins are plant-specific putative transcription factors, which contain highly conserved SBP domain and could bind specifically to promoters of the floral meristem identity gene SQUAMOSA and its orthologous genes to regulate their expressions. In this study, 17 nonredundant SBP-box genes in Arabidopsis genome and 19 in rice genome were identified by using the known SBP domain sequences as queries. The phylogenetic analysis suggested that the main characteristics of this family might have been in existence before the split of Arabidopsis and rice, and most SBP-box genes expanded in a species-specific manner after the split of monocotyledon and dicotyledon. Segmental duplication events contributed mostly to the expansion of this family in two species. All the SBP-box proteins were classified into 9 subgroups based on the phylogenetic tree, where each group shared similar motifs and the orders of the motifs in the same group were found almost identical. Analysis of nonsynonymous and synonymous substitution rates revealed that the SBP domain had gone through purifying selection, whereas some regions outside SBP domain had gone through positive or relaxed purifying selection. The expression patterns of the SBP-box genes were further investigated by searching against the EST database. Results showed that the Arabidopsis SBP-box genes are expressed chiefly in flowers, leaves, roots and seeds, while those in rice mainly in flowers and callus.(3) CPP-like genes are members of a small family which features the existence of two similar Cys-rich domains termed CXC domain in their protein products and distribute widely in plants and animals but do not exist in yeast. The members of this family in plants play an important role in development of reproductive tissue and control of cell division. To gain insights into how CPP-like genes evolved in plants, we conducted a comparative phylogentic and molecular evolutionary analysis of the CPP-like gene family in Arabidopsis and rice. The results of phylogeny revealed that both gene loss and species-specific expansion contributed to the evolution of this family in Arabidopsis and rice. Both intron gain and loss were observed through intron/exon structure analysis for duplicated genes. Our results also suggested that positive selection was a major force during the evolution of CPP-like genes in plants, and most amino acid residues under positive selection were disproportionately located in the region outside the CXC domains. Further analysis revealed that two CXC domains and sequences connecting them might have co-evolved during the long evolutionary period.(4) Plant cystatins or phytocystatins are cysteine proteinase inhibitors, which exist widely in different plant species. Because these genes can kill insects and pathogens by inhibiting the digestive function of the cysteine proteinase in gut, they are believed to play an important role in plant defense against pests and pathogens. In this study, we used the cystatin domain sequences that identified from the cystatin genes in plants as queries to search for cystatin genes in both Arabidopsis and rice genomes. A phylogenetic tree was then constructed based on the corresponding cystatin proteins from Arabidopsis and rice and the conserved sequences of these proteins were analyzed. Finally we searched the Genbank EST database to get the expression information of these genes. The results showed that 7 non-redundant cystatin genes in Arabidopsis and 12 in rice were identified based the identification of the cystatin domains and combining the results with the analysis of alignment and MEME. The phylogenetic analysis of these sequences indicated that the main character for cystatin genes might have been existent before the differentiation of Arabidopsis and rice. The cystatin domains among these proteins are highly conserved. The Arabidopsis and rice cystatin genes are expressed mainly in flower, leave, root, seed and callus, which is important for plant to defend the insects.(5) The high-affinity K+ (HAK) transporter gene family was the largest family in plant that functioned as potassium transporter and was important for various aspects of plant life. In present analysis, we identified 27 members of this family in rice in a genome-wide scale. The phylogenetic tree divided HAK transporter proteins into four distinct groups. The HAK genes in rice were found to have expanded in lineage-specific manner after the split of monocots and dicots. Functional divergence analysis for this family provided statistical evidence for shifted evolutionary rate after gene duplication. Further analysis indicated that both point mutant with positive selection and gene conversion events contributed to the evolution of this family in rice.The above results would lay a foundation for functional validation exercise aimed at understanding the role of the members of these gene families.

  • 【网络出版投稿人】 扬州大学
  • 【网络出版年期】2009年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络