节点文献

人参基因组测序和叶绿体基因组结构研究

Whole Genome Sequencing and Chloroplast Genome Structure Analysis in Panax

【作者】 殷金龙

【导师】 曲晓波;

【作者基本信息】 长春中医药大学 , 中药学, 2013, 博士

【摘要】 目的:对名贵药用植物人参进行全基因组测序、组装和初步的生物信息学分析,并初步分析其中的微卫星(SSR)位点,开发微卫星引物,以期为人参的分子生物学、功能基因组学和辅助育种等研究提供全基因组水平的背景支持。方法:通过第二代高通量测序技术(Illumina Hiseq2000测序技术)对野山参、大马牙、二马牙、高丽参、西洋参五种参类进行全基因组测序,并对序列进行初步的校正拼接组装。使用MISA程序查找人参基因组中的微卫星序列,并对其分布特征进行了统计分析和深入探讨;然后,截取SSR位点及其上下游序列,将这些SSR区域在五种参类之间进行同源比对,对于在五种参类中表现出多态性的位点,使用Primer3软件设计SSR引物,如此设计出的引物多态性比例会非常高。利用全基因组测序reads,与已有的所有叶绿体基因组进行比对,进行过滤,选取与已有叶绿体序列同源的reads进行拼接组装,以期得到人参的完整叶绿体基因组序列。结果:测定了五种参类的基因组,覆盖度大致在66倍至134倍,测序平均读长为100bp,将校正过的reads进行初步的拼接组装后,其中大马牙得到长度在200bp以上的contig有2324338条,长度在500bp以上的contig有320628条contig,最长的contig为15438bp,scaffold有524051条,总长度达到了3,036,436,415bp。选取长度在200bp以上的coting序列在五种人参基因组中进行微卫星SSR的查找与统计。在野山参基因组中总共找到311992个SSR位点,其中以二核苷酸重复类型的SSR数目最多,其次是单核苷酸重复、三核苷酸重复类型。选择在五种参类中存在多样性的SSR位点(67处)进行了引物设计。从人参的测序数据中,比对筛选得到叶绿体同源的reads,然后Velvet、Cap3和Gapcloser等软件进行拼接组装,结果显示大马牙人参叶绿体基因组序列总长度为147309bp。对大马牙人参叶绿体基因组序列进行了注释,结果显示有154个基因,包含了光合作用相关基因、tRNA基因、rRNA基因、核糖体蛋白基因等。在人参叶绿体基因组序列中检测SNP位点,得到可用于鉴别人参品种的单核苷酸多态性候选差异位点58个,其中23个位点位于基因编码区,在线粒体同源序列区域则检测到3250个潜在的SNP位点,并进行了测序验证。结论:本研究采用第二代测序平台进行了大规模的人参基因组测序,原始数据覆盖了人参基因组的66倍到134倍,进行了初步的拼接组装之后,从全基因组水平进行了SSR位点的查找,并根据五种参类存在的多态性位点,设计了SSR引物。同时创新性的利用全基因组测序数据,筛选、拼接得到人参叶绿体基因组序列,方便快捷,而且可以从叶绿体基因、线粒体同源序列中检测单核苷酸多态性SNP位点,高效利用了测序数据。本研究为后续人参的功能基因组学研究以及分子育种提供基因组背景。

【Abstract】 Purpose: In order to provide powerful research fundermentals for the studies ofmolecular biology, functional genomics and assisted breeding of a valuable medicinal plant,ginseng (Panax ginseng), whole genome sequencing was done on this perennial herb. Afterassembly and preliminary bioinformatic analysis on the sequencing reads, microsatellite (SSR)sites in the genome were statistically serveyed, and SSR primers were designed for thepolymorphic sites with variations between five species of ginseng.Method: Using the second generation high-throughput sequencing technology (IlluminaHiseq2000), the whole genomes of five ginseng species, wild ginseng, Damya, Ermaya,Gaolishen and Xiyangshen, were sequenced. The output reads were corrected using correctionsoftware by k-mer theory and then assembled using the SOAPdenovo pipe and the Gapclosesoftware. Searching for SSR loci was excuted by MISA software which was written in Perllanguage. Statistics results of the SSR lcoi distribution information were also given. SSR lociwith their upstream and downstream sequences were retrieved and compared for theirhomologies between five species. those polymorphic SSR regions were selected for SSRprimer design using Primer3software with high efficiency. The whole-genome sequencingreads were also mapped to all known chloroplast genomes, homologous reads were thenextracted and assembled respectively to complete the chloroplast genome sequences ofginseng.Result: We had sequenced the genomes of above five ginseng species with66x to134xcoverage, and the average length of each read was100bp. After the preliminary assembly ofthe reads, the genome of Damaya ginseng contained2324338contigs with the length longerthan200bp and320628contigs with the length longer than500bp. The longest contig was15438bp. The number of scaffolds was524051, totally equal to3,036,436,415bp in length.The contigs longer than200bp were selected for SSR search. In the genome of wild ginseng,totally311992SSR loci were found and most of them were dinucleotide repeats, followed bysingle nucleotide repeats, and trinucleotide repeats. We selected the SSR regions (67loci)which were diverse among five ginseng species to design SSR primers. The sequencing reads homologus to the known chloroplast sequences were retrieved and assembled to complete thechloroplast genome of Damaya ginseng using Velvet, Cap3and Gapcloser software. Thest genome was147309bp in length totally.Annotation for Damaya chloroplast genome was carried out using DOGMA and the resultsshowed that the Damaya ginseng chloroplast genome contained154genes, includingphotosynthesis-related genes, tRNA genes, rRNA genes, and ribosomal protein genes, etc.Single nucleotide polymorphism (SNP) detection in the chloroplast genome of Damayaginseng indicated that there were58SNP loci and among them23SNPs located in generegions. While in mitochondrial homologous sequence regions,3250potential SNP loci wereidentified. Both chloroplast and mitochondrial SNPs were verified by sequencing.Conclusion: In this study, the second generation high-throughput sequencing platformwas used for ginseng genome sequencing, which generated98G to220G raw data, reaching33~67x coverage over the ginseng genome. After preliminary assembly, SSR loci weresurveyed among the whole genome and SSR primers were designed according to thepolymorphic loci among five species of ginseng. At the same time, the whole genomesequencing data was screening and used for assembling chloroplast genome with aninnovative and rapid method. SNPs in chloroplast and mitochondrial regions were alsodetected. The method in this research for obtaining chloroplast genome was a convenient andefficient method which also fully used the whole genome sequencing data. The significance ofthis study is providing important genomic background for the following studies on thefunctional genomics and molecular breedings of ginseng.

节点文献中: