节点文献

栉孔扇贝(Chlamys farreri)BAC未端序列中SSR和SNP分子标记的开发及初步应用研究

Screening SSR and SNP Markers from BAC-end Sequences and Their Preliminary Application in Zhikong Scallop (Chlamys Farreri)

【作者】 张秀英

【导师】 相建海;

【作者基本信息】 中国科学院研究生院(海洋研究所) , 海洋生物学, 2012, 硕士

【摘要】 栉孔扇贝(Chlamys farreri)是我国重要的经济养殖贝类,具有抗逆及快速生长性状群体的选育是栉孔扇贝养殖业得以持续发展的基础。目前迅速发展的分子标记辅助育种技术为优良品种的快速选育提供了有力支撑。本研究以栉孔扇贝为主要研究材料,探讨研究了栉孔扇贝BAC末端序列中SSR和SNP分子标记的开发检测及初步应用。利用本实验室所构建的栉孔扇贝两个BAC文库,随机挑取10,237个BAC克隆进行末端测序并对得到的序列进行生物信息学分析。得到的BAC末端序列(BAC-ended sequences, BESs)经base-calling,去除载体序列、大肠杆菌E. coli基因组等污染序列后,共得到17,447条BESs(cut-off value=Q20),平均读长为446bp,测序总长度为7,773,272bp,可覆盖0.63%栉孔扇贝基因组序列。其中14,628条BESs(83.84%)是7,314个BAC克隆双末端均测序成功的结果。分析显示,栉孔扇贝基因组中(A+T)含量为63.45%,(G+C)含量为36.55%,AT含量明显高于GC含量,可见栉孔扇贝基因组序列中AT分布丰富。经Tandemrepeats finder软件分析显示,8,550条BESs含有串联重复序列,占测得序列的49.0%,其中含有的串联重复序列共计17,785个。重复单元为1-6bp的简单重复序列(SSR序列),≥12bp重复中,以六核苷酸重复为主,五、四核苷酸重复次之,三核苷酸重复最少。经RepeatMasker软件分析,发现了大量的反转录重复元件,其中LTR/Gypsy和LINE/CR1最为丰富,占整个基因组序列的1.87%和1.22%。将栉孔扇贝BESs与Nr、Nt及EST数据库进行blast比对,分别有2,083、1,375和1,901条BESs序列与相应的数据库比对上,其中446(2.56%)条比对上栉孔扇贝相关基因序列。与已完成基因组测序的10种无脊椎动物,以及2种脊椎动物的比较基因组学分析中发现,栉孔扇贝可能与紫色球海胆(Strongylocentrotus purpuratus)亲缘关系最近。利用栉孔扇贝BESs进行了微卫星标记的开发,选择其中14个微卫星标记对大连和青岛两个地理群体进行遗传多样性研究,分析其遗传结构和分化水平。14个基因座在两群体中的平均等位基因数Na分别为18.9286和26.2143,平均有效等位基因数Ne为11.7505和17.0891,平均观察杂合度Ho为0.5100和0.4204,平均期望杂合度He为0.9156和0.9450,多态信息含量PIC分别为0.8940和0.9302,群体遗传多样性水平较高。两群体间的无偏遗传相似性系数为0.4879,遗传距离为0.7177,平均基因分化指数Fst为0.0243,基因流Nm为10.0179,显示群体间遗传分化程度较弱,遗传变异主要来自于群体内个体之间,经Hardy-Weinberg平衡检验,两群体普遍存在杂合子缺失现象。本研究表明,所开发的BES-SSR是高度多态位点,用于群体遗传多样性分析效果很好,显示BES是微卫星标记开发和应用的重要资源。基于栉孔扇贝BESs及本实验室所保存的2005年作图亲本DNA,分别采用PCR扩增后直接测序和基因组测序比对两种策略进行SNP分子标记的开发及检测。第一种策略是根据栉孔扇贝BESs利用BatchPrimer3.0批量设计引物,共合成PCR引物370对,其中260对引物能够在亲本中有效扩增,扩增效率为70.27%。将亲本的PCR扩增产物分别测序,得到的序列利用Sequencer5.0Demo软件进行比对,共开发出候选SNP位点342个,分布在112个BAC克隆上。其中符合拟测交策略的SNP有154个,分布在76个BAC克隆上。利用该方法也开发出19个Indel标记,分布在13个BAC克隆上。这些SNP及Indel标记经分型及连锁分析后可用于遗传图谱的构建。第二种策略是通过基因组高通量测序,获得亲本各约10倍覆盖的Solexa100bpPE测序数据,利用ssahaSNP软件进行SNP的筛选,其中比对的参考序列是栉孔扇贝17,447条BESs。设置参数match、identity和map值分别为80、92和2时,得到候选SNP位点222,182个,候选Indel位点41,250个;当这些参数值分别提高到90、95和5时,显示得到53,398个候选SNP位点,7,092个候选Indel位点。为验证这些SNP,选择32个SNP位点设计引物进行验证,32个位点扩增效率为84.38%,测序成功率为77.78%,在可分析数据中验证SNP正确率为76.19%,其中可用于遗传作图的SNP比例为56.25%。研究结果初步证实以BESs为参考序列通过基因组测序后大规模比对筛选SNP的策略是可行的,是进行大规模SNP开发以及构建高密度遗传连锁图谱的重要途径。分别采用TP-M13荧光检测技术和MALDI-LOF质谱法对栉孔扇贝BAC末端SSR和SNP分子标记进行分型,成功得到12个SSR和39个SNP的分型结果。利用这些标记以及本实验室已发表的AFLP标记,成功构建了雌雄两张遗传连锁图谱;雌性连锁群有149个标记,其中含16个SNP和4个SSR;雄性连锁图有201个标记,其中含21个SNP和3个SSR。雌雄两连锁群中分别有18和22个标记与物理图谱的contigs对应上,初步实现了物理图谱与遗传图谱的整合。

【Abstract】 Zhikong Scallop (Chlamys farreri) is one of the traditional and most importantmolluscs that are being widely cultured in China. The marker-assisted selection (MAS)provides the fundamental theories and applied experiences for this task not only inbetter utilizing the available resources but also increasing production in aquaculture.This research is to develop and characterize BAC-based molecular markers such asSSR (Simple Sequence Repeats) and SNP (Single Nucleotide Polymorphisms) forZhikong Scallop.To further understand scallop genome, a total of10,237BAC clones wererandomly selected from two scallop BAC libraries and both ends were sequenced withSanger method,17,447BAC-end sequences (BESs) including7,314paired-ends wereobtained, with an average read length of446bp after trimming and quality filtering. Atotal of7,773,272bp were generated, representing0.63%of the scallop genome.Based on this survey, the scallop genome was found to be highly AT-rich, with63.45%AT and36.55%GC.17,785repetitive elements were found in8,550BESs.For the SSR (1-6bp tandem repeats,≥12bp), hexa-nucleotide motifs were the mostabundant, followed by penta-nucleotide and tetra-nuceotide motifs, tri-nuceotidemotifs were the least. After the blast with Nr/Nt and EST database, we found2,083、1,375and1,901hits specifically. Among of them,446(2.56%) BACs have hits to thescallop related genes on both ends. A few of scallop BESs were anchored to thegenome of some sequenced species respectively. These BESs were identifed as amajor genome resource for scallop and mollusc genomic research. Microsatellite markers were developed from the BAC-end sequences and used toanalyze the genetic structure and genetic differentiation in two C. farreri populations(Dalian ang Qingdao).14polymorphic SSRs were chosen to amplify and then analyzethe genetic diversity in the two populations. A total of395alleles were obtained at thefourteen microsatellite markers and the number of alleles in each locus ranged from8to38in the two populations. The average number of alleles (Na) was18.9286and26.2143respectively. The average effective number of alleles (Ne) was11.7505and17.0891respectively. The mean observed heterozygosity (Ho) was0.5100and0.4204.The mean expected heterozygosity (He) was0.9156and0.9450. The data suggestedboth the two populations have high genetic diversities. The mean polymorphicinformation content (PIC) was0.8940and0.9302, which were both greater than0.5,indicating the fourteen loci were highly polymorphic. The unbiased genetic identityindex was0.4879, and the genetic distance was0.7177. The coefficient of genedifferentiation (Fst) and gene flow (Nm) between two populations were0.0243and10.0179respectively. Low genetic differentiation was observed between the twopopulations, and the variance mainly came from individual difference. Significantdeviation was detected by Hardy-Weinberg equilibrium test. There was heterozygotedeficiency at all loci. The results showed BAC-end sequence was an effectiveresource for development of SSR markers for genetic and genomic researches.Two strategies were applied to develop SNP markers based on the resources ofBESs and next-generation sequences of the two mapping parents which conserved inour laboratory. Firstly, we designed370pairs of primers with the BatchPrimer3.0software.260pairs were successfully amplified, which occupied70.25%The PCRproducts were sequenced with Sanger method. Then we used the BioEdit andSequencer5.0Demo software to do blasting.342SNPs were detected in112BACclones, and among of them,154SNPs in76BAC clones met the "pseudo-testcross"mapping strategy. With this method,19Indels were also found in13BAC clones.These SNPs and Indels are good resources for Zhikong Scallop genetic mapping after genotyping and linkage analysis. Secondly, the SNP searching software ssahaSNPwas used to develop BAC-based SNPs. The genome of two mapping parents weresequenced with next-generation sequencing technology (Illumina HiSeq-2000)resulted in11.67Gb and11.78Gb clean data respectively. Then we performed thealignment process by ssahaSNP for SNP detection. The number of SNPs detected isdifferent due to different set of parameters. The more strict parameters are set, thefewer SNPs are detected. When we set the parameters “match”,“identity” and “map”to90,95and5respetively,222,182SNPs and41,250Indels were developed. Oncethe parameters were increased to90,95and5respectively, the number of SNPs andIndels is accordingly decreased to53,398and7,092. To validate these SNPs andestimate the false positive rate,32SNP sits were randomly selected for primer design,PCR amplification and sequence alignment. The results showed that84.38%pairs ofprimers were successfully amplified and the success rate of sequencing was77.78%.Among the SNPs available for alignment analysis,76.19%of SNPs were proved to betrue positive. The research demonstrated that it is practical to detect SNP usingssahaSNP with BESs as the reference sequence. It is vital for the large scale SNPdevelopment of Zhikong Scallop and the SNP genetic map construction.TP-M13fluorescent-labelled system and MALTI-LOF Mass Spectrometry wereused to genotype SSR and SNP markers. The genetic linkage map of C. farreri wasconstructed using12microsatellite markers,39SNP markers,373AFLP markers andone sex marker. Linkage mapping was performed using the F1outbred families,which contained149and201markers respectively in the female and male linkagepopulation. The number of linkage groups was18in female and19in male, whichwas consistent with the haploid chromosome number of C. farreri.33contigs on thephysical map were integrated on the linkage map of C. farreri, which is important forwhole genome sequencing, QTL analysis and Marker-Assisted Selection.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络