节点文献

基于公共数据库棉花非冗余性EST-SSR新标记的开发、评价及应用

Development, Evaluation and Application of New Nonredundancy EST-SSR Markers from Gossypium Based on Public Database

【作者】 王为

【导师】 王坤波;

【作者基本信息】 中国农业科学院 , 作物种质资源学, 2012, 博士

【摘要】 近年来由于公共数据库尤其是核酸数据库数据呈指数式增长以及生物信息学技术革命,使得基因组学、蛋白组学等飞速发展。如何合理、高效利用这些数据并应用到基因组学研究中来是一个迫切需要解决的问题。然而冗余性是分子标记开发过程面临的一个重要问题,但相关报道较少。至今没有合适的软件同时分析一对引物的冗余性,造成研究开发的重复性,浪费时间和成本。为系统集成研究棉花EST资源,本文开发非冗余的功能标记并进行相关应用研究,为基因组测序、转录组测序产生的海量信息积累技术资料。本研究对公共数据库现有的393753条棉花EST序列分析得到349815条非冗余EST序列,借助自主开发的SSRmine软件共发掘SSR位点11372个,分布于10507条EST中,EST-SSR的频率是3%,平均相隔21kb出现一个SSR。再利用上述去冗余的且在棉属中没有被开发过的EST序列设计引物,利用自主开发的SSRD软件通过SSR引物序列下载、预处理等6个步骤去除来源于自身部分同源序列以及与棉花CMD(http://www.cottonmarker.org/)网上释放的相似性SSR引物,得到了1000对非冗余性引物,定名为CICRXXX。并分别选用12个不同棉种的代表性材料对其中的100对进行引物功效评价,包括多态信息含量及引物通用性研究以及一套陆陆群体的初步定位。结果显示,100对SSR引物筛选出的56对均能在12份材料间扩增出稳定明显的条带,其中多态性引物35对,多态率占35%。引物的PIC变幅为0.097~0.888,平均为0.482;1对海岛棉EST-SSR引物在12份材料间的通用性为100%,25对亚洲棉引物通用性为81%,74对陆地棉引物通用性为80.1%。对开发的1000对新引物又重点进行了海陆BC1群体遗传图谱定位、67份野生棉材料遗传多样性评价、以及新标记对应的EST序列进行功能注释及KEGG代谢分析等应用研究。结果如下:1.从1000对EST-SSR新引物筛选出的380对多态性SSR引物均能在67份野生棉材料间扩增出稳定明显的条带,共检测出660个片段,平均每对引物为1.73个。多态信息含量的变幅为0.026-0.824,平均为0.384;有效等位基因数(Ne)在1.024-5.698变动,平均为2.64;基因多样性(H′)平均为4.38。UPGMA聚类分析显示,在遗传相似系数为0.8处将67份材料分为八类,聚类结果大体和Wendel(2010)的大体一致,结果还表明不同来源地的同一棉种材料聚类结果与染色体组相关,与地理来源相关性不显著。2.中国农科院棉花研究所袁有禄研究员实验室已经构建一张基于中棉所36和海1的BC1群体全长4000多cM的遗传图谱(文章未发),本研究将1000个CICR标记定位到该图谱上的有132个,涉及136个位点,其中A亚组63个,D亚组73个。涉及棉花全基因组的1-26条染色体,其中19号染色体最多,有17个标记,偏分离位点数共42个。3.对1000个CICR标记对应的EST序列进行功能注释,在level2水平,把有功能的分成细胞组分、分子功能和生物进程3个类型。其中976条属于细胞组分,597条属于分子功能,1126条属于生物进程。对应的EST序列中有239条(约23.9%)序列有代谢途径。最多的是碳水化合物和能量代谢,其次是氨基酸代谢。通过初步评价和应用研究表明新开发的非冗余性EST-SSR标记功效尚可,首次构建的这套冗余性检测评价方法较为可行,可以进行相关基因组学等应用研究。

【Abstract】 In recent years exponential growth of public database especially nucleic acid data and bioinformatics technology revolution has made genomics and proteomics develop rapidly. How to use these data reasonably, efficiently and apply to genomics research is an urgent problem. However, redundancy has become a very important problem on the process of molecular markers development, and less relevant reports. Until now there is no right software to analyse the redundancy of a pair of primers meanwhile, causing the repeat research and development, a waste of time and cost. For system integration cotton EST(Expressed Sequence Tag) resources, the paper developed nonredundancy functional markers and carried out related applied research. It will accumulate technology materials for the abundant information from genome sequencing and transcriptome sequencing.A software Clustal X was used to analyse the redundancy of393753ESTs of Gossypium available in public database. By mining349815non-redundant ESTs, a total of11372SSR(Simple Sequence Repeat) loci derived from10507ESTs using a software SSRmine developed by ourselves were observed. The frequency of ESTs containing SSRs was3%, with an average of one SSR in every21kb of EST sequence. One thousand of new nonredundancy EST-SSR primers were developed based on the mentioned above EST sequences removed the redundancy which have not been developed so far in Gossypium, And we used a software SSRD developed by ourselves to obtain non-similarity primers, designated CICR (China Institute of Cotton Research)XXX through six steps, including SSR primer sequences download, pretreatment, Blastn, extraction of primer numbers of similarity score more than81%, extraction of redundant primers pairs and making redundant primers in a line, to remove homologous sequences from themselves and similar primers released in CMD(Cotton Marker Database) from different cotton species. Among them,100primers were evaluated in polymorphism information content (PIC), transferability using twelve cotton species including seven representative diploids species and five tetraploid species and preliminary mapping based on a F2population of G. hirsutum L x G. hirsutum. The results showed that a total of56from the100pairs of SSR primers could be amplified the stable and clear polymorphic bands in the12accessions mentioned above, moreover,35out of56pairs of primers were polymorphic, with the primer polymorphism ratio of35%. PIC of these primers ranged from0.097to0.888, with the average of0.482. Totally, the transferability among twelve cotton species was100%for a pair of EST-SSR primers from Gossypium barbadense L.,81%for25primers from G. arboreum and80.1%for74primers from G. hirsutum, respectively. It showed the new non-redundant EST-SSR markers efficacy is good and the method is feasible.One thousand of pairs of new primers were carried out the application study such as the genetic group mapping with a BC1population of G. hirsutum x G. barbadense, assessment of genetic diversity with67wild cotton materials and the function annotation and KEGG metabolic pathways analysis of the corresponding ESTs. Including:1. A total of380from1000pairs of SSR primers were used to amplify67accessions from wild cotton, which could produce stable and clear polymorphic bands. Six hundred and sixty DNA fragments were obtained among all materials with the average of1.73. The polymorphism information content (PIC) of these primers was from0.026-0.824, with the average of0.384, effective number of alleles (Ne) varied from1.024to5.698with the average of2.64and the Shannon-Weaver diversity index (H) with with the average of4.38. The UPGMA cluster analysis showed that when the genetic similarity coefficient was0.8, it classified materials into eight categories. The clustering result was approximately in accord with Wendel (2010) results. Meanwhile, it demonstrated that the clustering result of the same cotton species sources from different places was related with chromosome group, there was no significant correlation with geographical origin.2. A genetic proup had been constructed in the lab of Professor Yuan Youlu in Cotton Research Institute, Chinese Academy of Agricultural Sciences with a total genetic distance of over4000cM, based on the genetic linkage analysis with135BCi population of ZMS36x Hail and SSR primers screened(unpublished data). One hundred and thirty-two CICR markers were mapped the genetic proup, involving136loci (A subgenome63, D subgenome73, respectively), covering all26chromosomes and there were the most markers on Chromosome19. And there were forty-two segregation distortion loci related to CICR markers in the map.3. The function annotation of the corresponding ESTs of1000CICR markers was carried out and on level2, these EST were classified into three types including components, molecular function and biological process. Among them976EST belongded to cell components,597belonged to molecular function and1126belonged to biological processes. Two hundred and thirty-nine (account for23.9%) ESTs among them were associated with metabolic pathways. Carbohydrates and energy metabolism was most, amino acid metabolism was second.Through the preliminary evaluation and the application research, it showed that the new nonredundancy EST-SSR markers efficacy was good. The redundancy identification and evaluation methods were feasible and can be carried out related genomics application research.

节点文献中: