节点文献

三疣梭子蟹基因组串联重复序列分析及微卫星标记的初步筛选

Analysis of Tandem Repeats in Crab Portunus Trituberculatus Genome and Microsatellite Marker Screening

【作者】 宋来鹏

【导师】 刘萍; 刘振辉;

【作者基本信息】 中国海洋大学 , 细胞生物学, 2008, 硕士

【摘要】 本文构建了三疣梭子蟹部分基因组文库,对片段长度为500~1500bp的4164个克隆进行测序,在此基础上分析了微卫星和小卫星在基因组上的分布特征。利用筛选的含有微卫星的克隆序列,设计出30对微卫星引物,并筛选出了9对多态性引物。对三疣梭子蟹部分基因组DNA文库测序,获得了总长度622 409个碱基的基因组DNA序列,从中找到微卫星重复序列(1-6bp重复)697个。统计微卫星重复类型,以两碱基重复数目最多,为445个,占微卫星序列总数目的63.84 %;其次是三碱基,152个,占21.81%;再次分别是单碱基,45个,占6.46%;四碱基,31个,占4.45%;五碱基,14个,占2.01%;六碱基,10个,占1.43%。在单碱基重复类型中,重复拷贝类别全部为A;两碱基重复类型中,AG重复数目最多,其次是AC和AT;三碱基重复类型中以ACT最多,其次是AGG和AAT;四碱基重复类型中, AGAC重复数目最多;五碱基重复类型中,以AACCT重复拷贝类别最多;六碱基重复中以AGGGGA重复数目最多。GC重复拷贝类别的重复数目很少,只发现1个(注册号为EU113241)。在序列拼接后长度500~1500bp的709个克隆中,筛选到130个小卫星序列,其序列总长度占测序序列总长度的2.55%。小卫星序列中,以12bp重复单位的序列数量最多(10.77%),总体趋势表现为重复单位越长,相应的重复序列数目越少(R=-0.663,p<0.01)。小卫星重复单位拷贝数分布范围以8bp重复单位最广为3.9~66.5;其次是13bp重复范围在2.0~40.6;再次是26bp重复,范围在2.3~21.0。平均拷贝数最高的三种重复类型分别为8bp重复(19.96),25bp重复(16.00)和22bp重复(15.85)。小卫星序列中各重复单位的拷贝数分布范围2~66.5,集中分布在2~25,且表现为拷贝数目越大,其相应的重复序列数目越低的趋势。130个重复序列分别由123种重复单位所组成,因而小卫星重复序列的类型很多。我们初步分成三类:两种碱基组成类别、三种碱基组成类别和四种碱基组成类别,并进一步根据各个重复序列中所含有的碱基种类的数量从大到小排列这些碱基而分成若干小类。从这些分类中可以看出,三疣梭子蟹基因组中的小卫星整体上是富含A/T的重复序列,并揭示了其与微卫星重复序列之间的关系,即一部分小卫星重复序列可能起源于微卫星序列。对蟹类微卫星分离方法、引物设计、遗传学特性以及在种群遗传、家系分析、遗传多样性评价等方面的最新研究进展进行了综述,并分析了微卫星分析中无效等位基因(null allele)、“结巴”带(stutter bands)和上游等位基因“扩增丢失”现象(upper allelic dropout)的产生原因以及对微卫星基因型判读带来的影响。根据建立的三疣梭子蟹部分基因组文库,筛选其中含有微卫星序列的克隆设计引物。在709个克隆测序序列中,找到包含完整侧翼序列(长度大于50bp)的重复序列,设计了30对微卫星引物,从中筛选出了9对微卫星多态性引物。

【Abstract】 In this paper, random genome library of crab Portunus trituberculatus was constructed and the lengths of 4164 sequenced clones were between 500 to1500bp. The distribution and frequencies of microsatellite and minisatellite wereanalyzed from the 709 sequences spliced; nine polymorphic microsatellite primer pairs were screened from these clones included short tandem repeat.By sequencing randomly, 4164 clones of sequences in the genomic library of crab Portunus trituberculatus were obtained. This study use software DNASTAR (Version 5. 0) to assemble all of the clones .The length of DNA sequences is about 622,409 bp in total.With the help of the bio-soft Tandem Repeats Finder (Version 2. 02), 697 microsatellite repeat sequences are found in the sequences. In the 697 repeat sequences , the number of the dinucleotide repeat is 445 , and it’s the most one(63.84 %) among all of the repeat sequences . The second one is the trinucleotides repeat , 152 (21.81 %) ; the third one is the mononucleotide repeat , 45 (6.46 %) ; the forth one is the tetranucleotides repeat , 31 (4.45 %) ;the fifth one is the petranucleotides repeat , 14 (2.01 %) ; the sixth one is the hexanucleotides repeat , 10 (1.43 %) .Number of repeat sequences that composed of the motif of A is 46, and don’t find the motif of C among the mononucleotide repeat. In dinucleotides repeat , the number of AG repeat is 214 , which is the most one, accounting for 48.09 %; and the second and third one are AC and AT repeats, 187 (42.02 %) and 43(9.66 %) respectively. Eight classes of repeat sequences that respectively composed of the motif ACT , AGG, AAT , ACC , AAG, ATC, AAC and AGC , are found in the trinucleotides repeat, in which the number of ACT repeats is 42, the most ; the second one is AGG (35) ; the others are AAT(28)、ACC(21)、AAG(9)、ATC(7)、AAC(7) and AGC(3) in turn. AGAC , AACCT and AGGGGA repeats are the most ones in tetranucleotide , pentranucleotide and hexanucleotide respectively. One GC dinucleotide repeat is found in our study and the sequence is referred to the GenBank, and the number of accession is EU113241. The reason of fewer GC repeat is also discussed in the article. Two possible answers are that: one is methylation of C in CpG islands resulting in the mutation of C-T; and another is that it is difficult to sequence the GC repeat sequences.Distributions of copy numbers in different types of repeat sequences are as follows: copy numbers of mononucleotide repeats are mainly between 28~40 and 68~76 , accounting for 80.00%; copy numbers of dinucleotides are mainly between 12and 36 , accounting for 64.04 %; copy numbers of trinucleotides repeats are mainly between 8 and 24 , accounting for 57.90%; copy numbers of tetre- , pentra- and hexanucleotides repeats together are mainly between 4 and 12. In general , the length of microsatellite repeat sequences is mainly between 24 to 72 bp. Based on the above point , it is believed that the nucleotide mutation of microsatellite locations are accumulated largely in a long term of evolution ; and there would be abundant polymorphism in these locations. Therefore , it would be very practical to use microsatellite to study the genome of Portunus trituberculatus and would beapplied to a variety of fields including population differentiation, kinship analysis, linkage analysis, and evolutional and ecological studies. This study provides a base for Portunus trituberculatus microsatellite research.With the help of the bio-soft Tandem Repeats Finder (Version 2. 02), 130 minisatellites were screened in the crab’s genome DNA sequences. Their cumulative length occupied 2.55% of the total length of DNA sequences. In the minisatellite sequences, twelve-nucleotide repeats were the most frequent type, accounting for 10.77% of the total number of minisatellites. It showed that the number of sequences decreased with the length of its repeat unit(R=-0.663,p<0.01).Eight-nucleotide repeat had the largest range of copy number of repeat unit (3.9~66.5),the following were thirteen-nucleotide repeat type(2.0~40.6) and twenty-six-nucleotide(2.3~21.0) ,respectively. Descending three repeat types in mean copy number of repeat unit were eight-nucleotide repeat (19.96), twenty-five-nucleotide (16.00) and twenty-two-nucleotide (15.85), respectively. The range of copy number of repeat unit varied from 2 to 66.5, and the copy number mostly ranged from 2 to 25. Moreover, it was showed that the number of corresponding minisatellites decreased as copy number of repeat unit increased. In our research , the 130 minisatellite sequences are composed of 123 kinds of repeat units , so it is very difficult to classify the minisatellite sequences. We think the minisatellite sequences can be classified into three types: dinucleotide minisatellite sequences composed of two kinds of nucleotide, trinucleotide minisatellite sequences composed of three kinds of nucleotide and tetranucleotide minisatellite sequences composed of four kinds of nucleotide. Further, all of above sequences can be divided into many sub-types according to the composition of nucleotide and their number from large to small. Totally, the minisatellite sequences in Portunus trituberculatus are A/T rich. We also discussed the genesis and evolution of minisatellite repeat sequences, and think the minisatellite repeats may come from the microsatellite repeats. It would be very practical to use minisatellite to study the genome of Portunus trituberculatus and would be applied to a variety of fields.In this article , we review isolation methods , design of primer,developmental status and genetic characteristics of microsatellites , and their applications in studies on population study , pedigree analysis , assessment of genetic diversity , and analyzed the causes resulting to null allele , stutter bands , upper allelic dropout and their effects on genotyping of microsatellite.Microsatellite primers were designed from the short tandem sequences which were screened from 709 clones of genomic library. microsatellites which have the 50 bp upstream and downstream flanking sequences were used. Nine primer pairs with high amplified polymorphisms were screened from thirty primers paired designed.

节点文献中: