节点文献

基于EST数据库和转录组测序的茶树DNA分子标记开发与应用研究

Mining and Application of Molecular Markers from EST Database and Transcriptome Sequencing in Tea Plant (Camellia Sinensis)

【作者】 王丽鸳

【导师】 成浩;

【作者基本信息】 中国农业科学院 , 茶学, 2011, 博士

【摘要】 茶树(Camellia sinensis)是遗传研究和基因组信息比较缺乏的物种。目前,茶树上可有效利用的标记数量非常有限。本研究不仅以现有的公共EST数据库为基础进行茶树SSR和SNP分子标记开发应用研究,而且通过高通量RNA-seq获得大量茶树花的转录组序列,并以这些转录组序列为基础进行茶树SSR分子标记的大规模发掘,主要研究结果如下:(1)对NCBI网上公开的12,757条茶树ESTs序列进行聚类,成功构建了茶树的独立基因(Unigene)数据库,发现茶树的EST序列冗余率约为68.2 %,明确了茶树EST-SSRs的分布特征,设计了206对SSR引物,筛选出多态性SSR引物59对。(2)利用开发的SSR引物对茶树地方品种的遗传多样性取样策略和西湖龙井群体的遗传分化进行了研究,发现平均等位位点Na是最合适的遗传多样性取样参数,当用平均等位位点Na做参数,SSR引物等位位点数为5时,24个以上单株才能达到总体90 %以上的遗传变异;龙井群体具有较高的遗传多样性水平,平均多态信息含量PIC为0.4382,中度多态位点占62.5 %,高度多态位点占33.3 %。哈迪-温伯格平衡检验表明,66.7 %的SSR位点不符合哈迪-温伯格平衡。分子方差分析表明,西湖龙井五个居群间的遗传分化程度较低。(3)初步建立了茶树EST-SNP开发体系,明确了茶树EST中SNP的分布规律,茶树编码区的SNP发生频率约为0.58%,平均200bp就有个SNP位点,并进步推算出茶树基因组DNA序列的杂合率约为0.38%,平均300个碱基就可能出现个杂合位点。从237个多基因聚类簇中发现了818个SNP候选位点,设计了25对SNP引物进行DNA测序验证,发现EST-SNP候选位点的多态检出率为75%。(4)应用新代高通量测序技术对茶树花进行转录组测序,获得茶树花的转录组信息75,331条,平均序列长度为402bp,平均测序深度为23.45,平均测序覆盖度为0.895。通过基因表达水平RPKM值分布分析,发现茶树花的转录组以中低表达丰度的基因为主。经过和蛋白数据库NR、Swiss-Prot、KEGG和COG四个数据库比对,共有50,975条茶树花转录组的unigene被注释。(5)对茶树花转录组表达信息进行大通量SSR位点的发掘,发现了含SSRs的序列10,290条,共12,582个SSRs,茶树花转录组中SSR出现的频率为16.66 %。茶树转录组发现了340种碱基重复模式,在茶树花的转录组序列中共发现340种碱基重复模式,二碱基重复所占比例最高。茶树转录组所含微卫星序列长度呈偏正态分布,以重复长度小于15bp的SSR短重复序列最多,长度大于30bp的较长SSR序列重复所在比例很小。(6)自动批量设计了2,633对SSR引物,成功率为42.85 %。本研究对茶树分子标记辅助育种及功能基因的发现等都具有重要的意义。

【Abstract】 There is little genetic and genomic information available in tea plant (Camellia sinensis), especially effective DNA markers. In this study, the existing public EST database was used to exploit SSR and SNP markers. Moreover, transcriptome information through high-throughput RNA-seq in tea flower was obtained and was used to exploit SSR sites and markers. These results of this study are sumarrized as follows:(1) By clustering the 12,757 ESTs of tea downloaded from NCBI, a unigene database of tea, containing 4,000 unigenes, was successfully built. It was found that the redundancy rate for ESTs from tea was approximately 68.2 %. Meanwhile, the characteristics of SSR distribution were also explicated. 206 pairs of SSR primers were designed by Primer 5 and 59 polymorphism SSR primers were found.(2) Both sampling strategy for genetic diversity of tea landraces and the genetic diversity and differentiation Longjing tea landrace were study, as the utilizations of the SSR primers exploited above. It was found that the mostly suitable genetic diversity parameter for the sampling of tea landrace was the number of alleles NA and when the number of alleles per SSR locus was 5, at least 24 individual tea plants were needed to reach 90% of the total genetic diversity; the level of genetic diversity within Longjing tea landrace was high. The average PIC polymorphism information content (PIC) was 0.4382. 33.3 % SSR sites in this study were classifid to high polymorphism and 62.5 % were medium polymorphism. Hardy—Weinberg eguiliberum (HWE) test displayed that 66.7 %SSR sites were not in accordance with the HWE. AMOVA analysis showed that the genetic differentiation between five populations of Longjing tea landrace was low.(3) An EST-SNP exploiting system for tea was established preliminarily. The SNP distribution was identified. The occurrence frequency for coding region SNP in tea was appropriately 0.58 %. It meant that there was averagely one SNP in 200bp in tea ESTs. Furthermore, the hybrid rate for tea genome was deduced to be 0.38%, averagely one hybrid DNA site per 300 bp. 818 candidated SNP were exploited from 237 multigene clusters. Then 25 pairs of SNP primers were designed and 75 % of these sites were validated to be polymorphism by DNA sequencing.(4) Using high-throughput Illumina RNA-seq, the transcriptome from RNA of the flowers of Camellia sinensis was analyzed and 75,531 unigenes were obtained. The average depth and coverage for sequencing was 23.45 and 0.895 respectively. Distribution of RPKM value of all unigenes was analyzed and found that the genes with low and medium expression level were dominant in gene expression pattern of tea flowers.Sequence similarity analyses of four public databases (NR, COGs of NCBI, InterPro, KEGG) found 55,088 unigenes that could be annotated.(5) The SSR sites in the transcriptome from RNA of the flowers in Camellia sinensis were exploited with high-through. There were 12,582 SSRs present in 10,290 unigenes, the occurrence frequency of SSRs was 16.66 %. 340 SSR motifs were founed and dinucleotide repeats were the most abundant (44.99%). The length distribution of SSRs was seriously deviated from the normal distribution. The number of short sequence SSRs with length below 15 bp was maximum; the SSRs with length above 30 bp were in small proportion.(6)Automatically, 2,633 pairs of SSR primers were designed. 42.85 % of SSR sites were successfully used for primer design.These methods were efficient in functional gene discovery and useful for molecular marker-assisted breeding of tea.

【关键词】 茶树EST生物信息学分子标记转录组测序
【Key words】 Camellia sinensisESTBioinformaticsMolecular MarkersRNA-Seq
节点文献中: 

本文链接的文献网络图示:

本文的引文网络