节点文献
人全基因组乙型肝炎病毒基因(亚)型间重组体研究
Study of Inter-(sub)genotype Recombinants in Human Full-length Hepatitis B Viruses
【作者】 邓小燕;
【导师】 黄爱龙;
【作者基本信息】 重庆医科大学 , 生物制药与生物医用材料, 2012, 博士
【摘要】 人乙型肝炎病毒(Hepatitis B virus,HBV)属于嗜肝DNA病毒科(Hepadnaviridae)正嗜肝DNA病毒属(Orthohepadnavirus)的代表种,是目前已知的最小人类DNA病毒。不同毒株间发生重组是HBV病毒多样化的一个重要来源,可能引起病毒生物学特性及毒性改变甚至宿主范围的扩大,给乙型肝炎的基础研究和临床治疗带来巨大挑战。本研究主要通过分析目前世界范围内公共数据库中海量HBV全基因组数据资源,得到各种可能的基因(亚)型重组体,并统一用“亲本骨架序列基因型(大写)/亲本插入序列基因型(小写)”的方式表示各重组型(如B/c、D/a及C/a等);总结详细重组信息,包括热点和惰性重组区域、不同基因型的重组特点和重组倾向等;根据目前已知毒株流行趋势,在HBV重组研究领域首次提出用流行重组模式和散发重组模式描述各重组类型病毒株;首次全面分析各重组类型亲本骨架序列与亲本插入序列的基因亚型组成,在此基础上尝试分析在亚洲地区普遍流行的B/c型重组毒株可能的进化历程。主要研究内容如下:从GenBank上下载截至2010年10月的全部人HBV全长基因组数据并进行统一化处理,获得用于分析的基因组长度在3082-3384bp之间的3560条HBV全基因组序列,利用Access2007构建本地微型数据库Seq-HBV。建立基因亚型双代表结合系统发育树分型方法对全部3560条全长序列基因(亚)型精确分型。HBV分型结果表明:能够确定(亚)型有3450条,分别属于基因(亚)型A1-A7,B1-B8,C1-C10,D1-D8、E、F1-F4,G、H, I1-I2和J;能够明确基因型却不能确定亚型的有110条,分别属于A、B、C及D基因型。所有基因型中,C基因型序列数最多,所占比例达到37.1%;其次是B基因型占22.9%;其他依次是D基因型占14.6%,A基因型占13%,F基因型占2.3%,I基因型占1%,H和G基因型均占0.8%,最少的是J基因型,目前只发现一条序列。C基因型最主要亚型是C1和C2,C2亚型占到C型总序列数的81.2%,C1亚型占12.5%;B基因型中,B2和B1亚型分别占总B型的76.5%和4.8%;D基因型主要亚型是D1、D2及D3,毒株数分别占总D型的44.2%、22%及17.7%;A基因型中,A2和A1亚型分别占A型总序列数的51.7%和32.8%。在此基础上,创建了每个基因型及主要亚型的共有序列(Concensus sequence)作为判断基因(亚)型的参考序列。各基因(亚)型均具有一定的地理区域分布,同一地区存在多种基因型混合流行趋势。以大量数据分析为依据,严格挑选的78条双代表序列可以作为HBV基因(亚)型分型的参考序列;阐述重组毒株群与基因亚型的关系,优化HBV基因(亚)型分类标准,提出了确定HBV基因(亚)型的七个参考标准。根据HBV毒株基因组小且为环状特点,建立了步移区段基因型法(Stepping fragment-genotyping method)大规模搜索可能具有镶嵌结构的HBV毒株基因组。将搜索到的所有镶嵌结构基因组利用SimPlot软件扫描分析模式和序列相似性分析模式共同分析重组和定位重组位点。集中同一重组事件的疑似重组株、亲本序列、参考序列(基因型/亚型共有序列和各基因型/亚型代表序列)及外围序列对重组区域和全长序列分别进行系统发育分析。比较两系统发育树进,如果一个疑似重组病毒株在全基因组序列上与亲本骨架序列聚类成枝,而在重组区域与另一基因(亚)型序列聚类,且自展值(Bootstrap value)在70%以上,本研究鉴定为重组病毒株。最后,从人HBV全基因组中分析出17种杂交型共61种重组类型的915例基因型间重组病毒株,其中新发现2种杂交型(B/i和D/c)、22种重组类型及648株基因型间重组病毒株。首次在同一基因型内分析重组体,共得到5种杂交型的5例基因型内重组病毒株。重组病毒株组成与分布。918例HBV重组株分属于各基因型:A基因型(30例)、B基因型(743例)、C基因型(73例)、D基因型(58例)、E基因型(3例)、F基因型(1例)及G基因型(5例)。新发现的22种重组类型包括B/c(6种)、C/b(7种)、C/d和D/a各两种,以及B/i、C/a、D/c、D/e和A/d各一种。对已经报道和新发现的所有重组类型详细分析其亲本骨架序列及亲本插入序列的基因亚型组成,尝试寻找HBV基因型及重组型遗传进化线索。同一基因型内五种重组类型包括:C1/c2(2种)、D5/d1(2种)和D5/d3(1种),均来自亚洲地区。一般而言,重组型两亲本基因(亚)型都是当地的流行或存在的基因(亚)型: B/c主要在亚洲特别是东南亚流行;A/d主要印度流行;C/d主要在中国西部及外蒙古流行;D/e主要在非洲流行;G/a主要在欧洲流行。重组模式。以现有数据为基础,根据不同重组类型毒株流行趋势,首次尝试用流行重组模式(Circulating recombinant form,CRF)与散发重组模式(Sporadic recombinant form,SRF)描述不同HBV重组毒株的重组类型。如果毒株序列群有相同的重组类型且来自不同的国家或地区,说明此重组类型在HBV自然进化过程中已经保存下来,有进一步流行传播的趋势,我们归类为流行重组模式;如果一条或几条序列有自身特定的重组类型并来自同一标本,这类重组类型是零散发生的,可能在进化过程中被选择或被淘汰,暂不确定其传播性,我们归类为散发重组模式。本研究将全部61种基因型间重组类型归为7个流行重组模式,6个潜在流行重组模式和48种散发重组模式。对属于流行重组模式的重组株,可以着手从基础和临床开展全面研究,而对于属于散发重组模式的毒株,可以依托Genebank免费数据库网络及相关科研机构建立跟踪监测报告体系。重组倾向。A、B、C、D、E及G是比较活跃的基因型,都可以作为重组亲本骨架序列或亲本插入序列参与重组。基因型B与C、A与D、C与D、A与E、E与D、G与C及G与A之间都可以两两频繁发生重组,但是仅一种杂交型有形成稳定遗传毒株的趋势,如B/c、A/d、C/d、A/e、D/e、C/g及G/a等;B与D、B与E及C与E基因型间暂未发现重组株。A和C基因型更容易作为亲本插入序列与其他基因型发生重组形成D/a、E/a、B/a、C/a、D/a、G/a及B/c、D/c、G/c、F/c等重组类型;G基因型绝对毒株数量虽少却频繁参与重组,是值得关注的推动HBV进化的基因型。H基因型具有重组惰性,既不与参与其他基因型的重组,自身也不接受其他基因组的插入重组。重组热点与惰性区域。重组频率最高的区域是S基因3’端到P基因3’端包含整个X基因区域,其次是S区及C区。不同基因型以不同方式参与重组时重组发生位点及区域选择有差异,重组插入的位点一般位于基因两端,重组覆盖的区域长度不一。短区域nt2500-3000附近是除C/b重组型之外罕有重组事件发生的惰性区域。现有数据分析表明:HBV重组发生热点区域并非是形成稳定遗传毒株的主要区域。本研究主要分析目前世界范围内公共数据库中所有HBV基因(亚)型重组体,总结重组型的详细重组信息,为进一步探索HBV重组规律及基因(亚)型、重组型遗传进化提供参考依据。这对于HBV的基础研究和临床实践具有重要的意义。
【Abstract】 Human Hepatitis B virus is representative species of the familyHepadnaviridae of the genes Orthohepadnavirus,known as the smallesthuman DNA viruses. Recombination is one of the main causes of HBVvariation, which may affect the pathogenicity and transmissibility even theenlargement of virus host, and change the treatment and prognosis inpatients. This dissertation reports the results with extensive analyses of thecomplet HBV genomes currently available in GenBank to indentify allpossible recombinants: Establish a standardized way to describerecombinant by "Genotype of Backbone genome (uppercase)/Genotypeof Insert genome (lowercase)"(such as B/c, D/a and C/a); Summary detailedinformation of recombinants, including hot or Cold Spots of recombinationbreakpoint、the characteristics about recombinants and the recombinationtendency of different genotypes; Propose first about "Circulatingrecombinant form and Sporadic recombinant form"to classify recombinantbased on the epidemic characteristics; Analysis of subgenotypecomprehensive about two parents genome; Try to analyze the evolutionaryhistory of all strains come from hybird B/c. The main results of this paper are as follows:Download all complet HBV genomes from GenBank by October2010,and deal with the data by standardization. Finally,3560complet HBVgenomes were obtained of length between3082bp to3384bp. Building localdatabase Seq-HBV by Access2007.Building a novel strategy of double subgentype representativescombined with Phylogenetic Tree to classify the3560complet HBVgenomes. The results show that:3450genomes belonging to (sub)genotypeA1-A7, B1-B8, C1-C10, D1-D8, E, F1-F4, G, H, I1-I2and J;110genomesbelonging to genotype A, B, C and D with uncertain subgenotype. Thelargest group of Genotype was from genotype C occupy the37.1%;Genotype B is next, occupy22.9%; followed by genotype D with14.6%,genotype A with13%, genotype F with2.3%, genotype I with1%, genotypeH and G with0.8%;The least is genotype J, only one genome. C1and C2arethe main subgenotype of genotype C, accounted for81.2%and12.5%respectively;Subgenotype B1and B2of genotype B accounted for76.5%and4.8%respectively; Subgenotype D1,D2and D3of genotype D ccountedfor44.2%,22%and17.7respectively; Subgenotype A1and A2of genotypeA, ccounted for51.7%and32%respectively.(Sub)Genotype show distinctgeographic distribution, on the other hand, a variety of genotypes are popularon one region.The78double representative genomes can be as reference genomes of (sub)genotypes base on an extensive analysis of accumulated HBV genomesequence date. Currently we propose the seven criterias of minimalrequirenents for defining genotype and subgenotype.Established a novel strategy of "Stepping fragment-genotyping" to findthe HBV Mosaic genomes from a large number of genome data. Therecombination breakpoints of the potential recombinants were fartherprecisely determined by Simplot. Two phylogenetic trees based on completgenome and mosaic fragment were inferred to farther determine thegenotype of each mosaic fragment using a70%bootstrap value cut off.Finally,918genomes were revealed as inter-genotype recombinantswith17hybird and61inter-genotype recombinants forms. Two hybird (B/iand D/c),22recombinants forms and648recombinants of which beingrevealed for the first time. Five cases of recombinants on one genotype wererevealed for the first time also.Composition and geographical distribution of recombinants.918recombinants belonging to different genotypes: Genotype A (30cases),Genotype B (743cases), Genotype C (74cases), Genotype D (63cases),Genotype E (3cases), Genotype F (1case) and Genotype G (5cases).New forms of22kinds with B/c (6kinds), c/B (7kinds), c/d and d/a ofthe two kinds, B/I and c/a etc. with one kind. We assign the subgenotypes oftwo parents genomes with each recombinants base on extensive analysis,trying to find the evolution clues of the genotype and recombinants. The five cases of recombinants in one genotype are come from Asia:including C1/c2(2kinds), D5/d1(2kinds) and D5/d3(1kinds). Ingeneral, the genotypes of two parents for one recombinant are poplur in local:strains of B/c are popular in southeast Asia, strains of A/d are popular inIndian, strains of c/d are popular in west China and Mongolia, tsrains of G/aare popular in Europe.Recombination Forms. This is the first attempt to classify therecombinants with Circulating recombinant form(CRF) and Sporadicrecombinant form(SRF) according to epidemic characters of strains. If thestrains of recombinants have the same form and come from differentcountries or regions, indicating that the recombinants spread in apopulation,we classified them as strains of Circulating recombinant form; Ifone or a few genomes sequences from the same specimens, it is a chance,with no evidence of further spread among people, we classified it as strainsof Sporadic recombinant form(SRF).Tendency of recombination. The genotype A, B, C, D, E and G are veryactive to recombine. Genetype B and C, A and D, C and D, A and E, E and Dand C, G and G are recombine each other frquently, but only one hybridwhich can inherit steadily, such as B/c, A/c, C/d, A/e and D/e, C/g and G/a,etc.; there aren’t recombination events of genotype B and D, B and E or Cand E. Genotype A and C easier to insert the genome as formes includingD/a, E/a, B/a, C/a, D/a,G/a and B/c, D/c, G/c, F/c; Recombination between Genotype G and other genotypes are occurs frequently; It is never involve inrecombination events of Gneotype H.Hot or Cold Spots of Recombination Breakpoint. The most popularregions of recombination appeared to be that from3’ of S gene to3’ of Pgene contain whole X gene, followd by the region of S gene and C gene.Different genotypes with a variety of ways to participate in recombinationand had differences recombination breakpoint. Region of nt2500-3000ongenome which rarely happen recombination events except C/b form.According to available information indicates that: It may not produce thestrains with high genetic stability when the recombination events appearedon the genome region with hot spots recombination breakpoint.This dissertation analyses all possible recombinants of the completHBV genomes currently available in GenBank. The results will be of benefitto explore the genetic and evolution mechanism of (sub)genotypes or hybird.Such Study are of great value in Basic Research and clinical trials.
【Key words】 Hepatitis B virus; Genotype; Subgenotype; recombination; Circulating recombinant form; Sporadic recombinant form;