

Larch Genes Mining and Mechanism on Differetial Gene Expression and Heterosis Formation

【作者】 许晨璐

【导师】 张守攻;

【作者基本信息】 中国林业科学研究院 , 林木遗传育种, 2013, 博士

【摘要】 基因组资源在林木遗传育种中具有重要作用,而落叶松属此类资源匮乏,并因此限制了“基因组学辅助育种”策略在其遗传改良中的应用。杂种优势在落叶松育种领域有着广泛应用,但其分子机理未知。本研究首先应用新一代测序技术(Roche454),对我国两个重要落叶松造林树种(华北落叶松和日本落叶松)分别进行转录本、分子标记和miRNA挖掘。在此基础上,利用数字基因表达谱技术(RNA-seq)检测了华北落叶松×日本落叶松杂种与其亲本的基因表达差异,主要研究结果如下:1.提取华北落叶松和日本落叶松多个基因型、多个组织部位的RNA构建测序文库(cDNA文库),以454GS FLX Titanium平台开展转录组测序。分别获得华北落叶松和日本落叶松约130万条(平均长度345bp)和124万条(平均长度330bp)原始序列,经数据过滤、组装及冗余性评估后,最终分别获得75597条和80246条Unigenes,平均长度分别达466bp和436bp,其中相当一部分Unigenes(>4000条)具有较长的序列及完整的CDS结构。经与NR数据库比对,多达36222条华北落叶松Unigenes获得功能注释,匹配到24760条蛋白质序列,其中相当一部分蛋白质序列(>4500条)被较为完整地覆盖;日本落叶松比对结果与此类似,证明本次获得的Unigenes涵盖了大量且完整的落叶松基因。借助功能注释信息,剔除掉Unigenes中非植物序列及rRNA序列,并检索出若干对落叶松育种重要的基因。为后续研究更方便地应用此Unigenes,把Unigenes继续比对到TAIR, Poplar, PFAM, KOG, GO和KEGG等多个数据库,获得了基因名字、已知功能、保守的结构域、直系同源基因、所处的细胞位置、参与的生物过程和具体的代谢途径等信息;注释结果再次肯定了落叶松Unigenes的广覆盖面。2.共挖掘出1511个华北落叶松SSR标记和1498个日本落叶松SSR标记,其中约2/3的SSR被成功设计出引物。同时挖掘出高置信度SNP标记共计16436个(华北落叶松)和16437个(日本落叶松),并证明大部分落叶松基因属“多样化”进化模式。3.经与microRNA数据库比对,挖掘出11条落叶松miRNA序列,其中华北落叶松7条,日本落叶松4条,序列长度集中在21bp-22bp。经与其他植物miRNA序列比对,大部分miRNA保守性较强。共检索到37条华北落叶松靶基因序列和30条日本落叶松靶基因序列,其中1/2具有功能注释信息。4.表达谱研究筛选出了杂交组合间差异表达基因,并划分了表达模式,肯定了“非加性表达”,特别是“超显性表达”在杂种优势形成中的作用。差异表达基因多富集在“细胞外区域”,执行“萜类合成酶活性”、“(碳-氧)裂解酶活性”、“(阳)离子结合”、“水解酶活性”、“转移酶活性”、“氧化还原酶活性”和“β-葡萄糖苷酶活性”等分子功能,富集于“碳水化合物代谢”、“苯丙素代谢”、“植物细胞壁生物合成”、“次生代谢物生物合成”、“蛋白质加工”和“淀粉和蔗糖代谢”等生物过程或代谢通路。

【Abstract】 Genomics resources play an important role in forest genetic improvment.Yet, Larch genusis poorly characterized at molecular level with little sequence information available in publicdatabases, therefore hindered the application of genomics-assisted breeding approaches.Nowadays, the much-needed improvement for larch genomics-assisted breeding is acquiring anabundance of Unigenes and molecular markers which will provide critical information to helpidentify target genes for functional analysis, association studies and integration breeding. As acommon and important biological phenomena, heterosis has extensive application in the fieldof larch breeding, yet little consensus has been reached about the genetic basis of heterosis. Inthis study, we first used454GS FLX Titanium pyrosequencer to produce larch’s Unigenes, andconducted molecular marker and miRNA mining. Finally, differential expression gene ingenome-wide scale was screened between Larix.princips-rupprechtii, L.kaempferi and theirhybrids using digital gene expression technology. The main reasults were as follow:(1) Normalized cDNA collections from multiple tissues and genotypes were used tosample large numbers of expressed genes for L.principis-rupprechtii and L.kaempferi. Weobtained over1,300,000sequencing reads (mean length:345base pairs) for L.principis-rupprechtii, and1,240,000sequencing reads (mean length:330bp) for L kaempferi. De novoassembly yielded75,597(mean length:466bp) and80,246Unigenes (mean length:436bp),more than4,000Unigenes were longler than1kb and contain CDS. Based on sequencesimilarity with known proteins, these sequences represent approximately26,000unique genesand cover a broad range of Gene Ontology categories, and1,123Unigenes were assigned tospecific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Similarresults were achived for L.kaempferi. Our data provide the most comprehensive sequenceresource and an extensive collection of potential genetic markers currently available forL.principis-rupprechtii and L.kaempferi, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, theUnigene set has permitted significant enhancement of the number of publicly-availablemolecular genetic markers as tools for improvement of these species.(2) We located and characterized1,473simple sequence repeats (SSRs) and16,436singlenucleotide polymorphisms (SNPs) as potential molecular markers in our assembled andannotated sequences for L.principis-rupprechtii, and1,498SSRs and16,437SNPs forL.kaempferi. Most of larch genes belong to diversifying evolutionary patten.(3) In order to identify putative novel microRNAs belonging to evolutionary conservedfamilies, the larch transcriptome was compared with known microRNA hairpin sequences inmiRBase. A total of22significant local alignments between Unigenes and hairpin sequenceswere identified. After secondary hairpin strctures analysis,11conserved miRNAs comprising6miRNA families from L.principis-rupprechtii and L.kaempferi were in silico identified, andthirty-seven and thirty target mRNAs from L.principis-rupprechtii and L.kaempferi,respectively, were predicted by psRNATarget program. These results lay a solid foundation forfurther studying the regulative roles of miRNAs in the development, growth and responses toenvironmental stresses in larch.(4) Differential expression gene in genome-wide scale was screened in L.princips-rupprechtii, L. kaempferi and their hybrids using digital gene expression technology, and geneexpression mode was categorized. We found that non-additive gene expression, especiallyover-dominance mode played an important role in heterosis manifestation. Differetialexpressed gene are over-represented in extracellular region, terpene synthase activity, lyaseactivity, ion binding, hydrolase activity, transferase activity, oxidoreductase activity,beta-glucosidase activity, carbohydrate metabolic process, phenylpropanoid metabolic process,plant-type cell wall biogenesis, biosynthesis of secondary metabolites, protein processing,starch and sucrose metabolism. Function and pathway analysis of heterosis--associated geneswill lay a foundation for shed the relationship of gene differetial expression and heresosis.
