节点文献

基于高通量测序技术的药用植物表达序列标签系统的构建与应用

【作者】 李滢

【导师】 陈士林;

【作者基本信息】 中国协和医科大学 , 生药学, 2010, 博士

【摘要】 表达序列标签(Expressed sequence tag, EST)技术是分离与克隆新基因、研究基因表达谱的有效方法。本研究利用EST分析方法,建立药用植物转录组的研究方法体系,通过新一代高通量测序技术454 GS FLX Titanium获得的ESTs,发掘药用植物生物活性成分的合成相关的关键酶基因。本研究以甘草、丹参、蛇足石杉和龙骨马尾杉为例,分析了高通量测序技术及转录组分析方法在药用植物中的应用。甘草(Glycyrrhiza uralensis Fisch. ex DC.)是世界上使用最为广泛的药用植物之一,被广泛应用于食品和烟草添加剂。由于缺乏甘草的基因组和转录组数据,甘草酸生物合成途径尚不明确。本研究应用高通量测序技术454 GS FLX Titanium对甘草的营养器官的转录组进行测序,构建EST文库。获得59219条EST序列,平均长度409 bp。将454测序所得EST与GenBank中50666条甘草EST进行合并拼接,获得27229条unigene(11694条contig,1 5535条singleton)。将这些unigene与公共数据库(SwissProt, KEGG, TAIR, Nr和Nt)进行比对注释(阈值E≤le-5),其中20229条序列获得注释,注释结果大约包括10000独立的转录本。甘草酸骨架合成共涉及18个酶,在对EST文库的分析整理中获得了其中的16个酶的候选基因。本研究还发现了125个细胞色素P450候选基因和172个糖基转移酶候选基因。根据CYP家族,125个细胞色素P450候选基因被分为32类;根据GO基因功能分类,172个糖基转移酶候选基因被分为45类。最后根据Real-time PCR器官特异性表达分析,发现最有可能参加甘草酸合成的基因,包括3个细胞色素P450和6个糖基转移酶。丹参(Salvia miltiorrhiza Bge.)为唇形科(Labiatae)鼠尾草属常用中药,以根和根茎入药。现代化学及药理学研究表明丹参含有两类生理活性物质:脂溶性的丹参酮类化合物和水溶性的丹酚酸类化合物,目前丹参基因组学和转录组学的研究较少。本研究应用新一代高通量测序技术454 GS FLX Titanium对2年生丹参根的转录组进行测序,研究其基因表达谱,挖掘其功能基因。获得46722表达序列标签(express sequence tags, EST),序列平均长度414 bp,与Sanger测序的长度相当。所得序列与GenBank丹参EST合并拼接,获得18235条unigene,其中,454高通量测序发现了13980条新的unigene。数据库中的序列同源性比较表明,其中73.0%(13308条)与其他生物的已知基因具有不同程度的同源性。通过BLAST与GeneOntology分析获得了可能参与丹参酮合成的序列27条(编码15个关键酶),参与丹酚酸合成的序列29条(编码11个关键酶),细胞色素P450序列70条,转录因子序列577条。石杉属植物蛇足石杉(Huperzia serrata)和龙骨马尾杉(Phlegmariuruscarinnatuas)均含有石松类生物碱——石杉碱甲。石杉碱甲是治疗早老性痴呆症新药的主要成分,具有良好的应用前景。虽然石杉科药用植物具有重要的药用价值,但是对于其基因组和转录组的研究却极为有限,严重制约了新药的研发利用。在蛇足石杉和龙骨马尾杉的454-EST数据中,分别获得140930和79920条ESTs,拼接为36763和31812个转录本。其中,共注释到115个蛇足石杉和98个龙骨马尾杉的转录本与生物碱、萜类、黄酮/类黄酮等化合物的生物合成相关。在蛇足石杉和龙骨马尾杉中共同表达的CYP450s有63个,利用real-time PCR检测了蛇足石杉中的编码CYP450s的转录本在根和叶中的表达差异,发现有20个转录本是在叶中高丰度表达而在根中表达较低。这与石杉碱甲在蛇足石杉中的器官特异性分布即叶中含量最高、根中含量较低相一致。我们推测它们是参与石松碱生物合成的候选基因。此外,在蛇足石杉中发现2729个SSR位点,龙骨马尾杉中发现1573个SSR位点。454高通量测序技术作为药用植物功能基因组研究的重要手段可在甘草、丹参、蛇足石杉和龙骨马尾杉功能基因的发现中发挥重要作用。本研究获得了重要药用植物甘草、丹参、蛇足石杉和龙骨马尾杉的大量ESTs,研究了它们的转录组信息,发现了许多可能参与次生代谢产物生物合成、调节植物生长发育及环境响应的基因。这将为鉴定参与有效成分生物合成的功能基因提供极为丰富的基因资源,近而为实现利用生物技术生产甘草酸、丹参酮、丹酚酸、石松碱奠定理论基础。

【Abstract】 EST analysis is a cost-effective and rapid tool used for the isolation of novel genes and characterization of the gene expressed profile. We used a cDNA library construction method and massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform to generate EST datasets of medicinal plants for transcriptome analysis. The EST labraries of Glycyrrhiza uralensis, Salvia miltiorrhiza, H. serrata and P. carinatus were constructed in our studies.Glycyrrhiza uralensis is one of the most popular medicinal plants in the world and is also widely used in the flavoring of food and tobacco. Due to limited genomic and transcriptomic data, the biosynthetic pathway of glycyrrhizin, the major bioactive compound in G. uralensis, is currently unclear. We used the 454 GS FLX platform and Titanium regents to produce a substantial expressed sequence tag (EST) dataset from the vegetative organs of G. uralensis. A total of 59,219 ESTs with an average read length of 409 bp were generated.454 ESTs were combined with the 50,666 G. uralensis ESTs in GenBank. The combined ESTs were assembled into 27,229 unique sequences (11,694 contigs and 15,535 singletons). A total of 20,437 unique gene elements representing approximately 10,000 independent transcripts were annotated using BLAST searches (e-value≤le-5) against the SwissProt, KEGG, TAIR, Nr and Nt databases. The assembled sequences were annotated with gene names and Gene Ontology (GO) terms. With respect to the genes related to glycyrrhizin metabolism, genes encoding 16 enzymes of the 18 total steps of the glycyrrhizin skeleton synthesis pathway were found. To identify novel genes that encode cytochrome P450 enzymes and glycosyltransferases, which are related to glycyrrhizin metabolism, a total of 125 and 172 unigenes were found to be homologous to cytochrome P450s and glycosyltransferases, respectively. The cytochrome P450 candidate genes were classified into 32 CYP families, while the glycosyltransferase candidate genes were classified into 45 categories by GO analysis. Finally,3 cytochrome P450 enzymes and 6 glycosyltransferases were selected as the candidates most likely to be involved in glycyrrhizin biosynthesis through an organ-specific expression pattern analysis based on real-time PCR.To investigate the profile of gene expression in Salvia miltiorrhiza and discover its functional gene, we used the 454 GS FLX platform and Titanium regent to produce a substantial expressed sequence tags (ESTs) dataset from the root of S. miltiorrhiza. A total of 46 722 ESTs with an average read length of 414 bp were generated. The 454 ESTs were combined with the S. miltiorrhiza ESTs from GenBank. These ESTs were assembled into 18 235 unigenes. Of these unigenes,454 sequencing identified 13 980 novel unigenes.73% of these unigenes (13 308) were annotated using BLAST searches (e-value≤le-5) against the SwissProt, KEGG, TAIR, Nr and Nt databases. We found 27 unigenes (encoding 15 enzymes) involved in tanshinones biosynthesis, and 29 unigenes (encoding 11 enzymes) involved in phenolic acids biosynthesis. We also found 70 putative genes encoding cytochromes P450 and 577 putative transcription factor genes.Plants of the Huperziaceae family, which comprise the two genera Huperzia and Phlegmariurus, produce various types of lycopodium alkaloids. The lycodine type alkaloid of Huperzine A (Hup A) has been used as an anti-Alzheimer’s disease drug candidate, possessing good market prospects. Despite their medical importance, little genomic or transcriptomic data are available for the members of this family, which seriously hampered the development and use of new drugs. For H. serrata and P. carinatus 454-EST datasets,36763 and 31812 unigenes were generated from 140930 and 79920 reads, respectively. Approximately 115 H. serrata and 98 P. carinatus unigenes associated with the biosynthesis of triterpenoids, alkaloids and flavone/flavonoids were located in the 454-EST datasets. A total of 63 unigenes encoding cytochrome P450s (CYP450s) were present in both of H. serrata and P. carinatus. In particular,20 H. serrata CYP450s candidate genes, which are more abundant in leaves than in roots and might be involved in lycopodium alkaloid biosynthesis, were identified based on the comparison of H. serrata and P. carinatus 454-ESTs and real-time PCR analysis. The expressed profiles of these candidate genes were consistent with the organ-specific accumulation pattern of Hup A, which is accumulated higher in H. serrata leaves and lower in roots. In addition,2729 and 1573 potential SSR-motif microsatellite loci were identified from the H. serrata and P. carinatus 454-ESTs, respectively.The 454-EST resource allowed for the first large-scale EST project and transcriptome analysis for G. uralensis,S. miltiorrhiza,H. serrata and P. carinnatuas. We identified many candidate genes involved in the biosynthesis of bioactive compounds and developmental regulation. These results establish an essential resource for understanding secondary metabolite biosynthesis. This study will lay the foundation for the production of glycyrrhizin, Tanshinone, Salvianolic acid and Hup A using biotechnologies.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络