节点文献
中国结核分枝杆菌群体基因组的分析
Genome-wide Analysis of the Populations of Mycobacterium Tuberculosis from China
【作者】 林楠;
【作者基本信息】 福建农林大学 , 微生物学, 2014, 博士
【摘要】 结核分枝杆菌作为结核病的病原菌,自上世纪五十年代以来,不断发现有效的抗结核菌药物,使古老的结核病流行得到了一定的控制,但是由于不少国家对结核病的忽视,艾滋病的并发感染和抗生素的滥用等,结核病死灰复燃。在中国、印度等地出现了越来越多的耐药结核分枝杆菌,作为中国盛行的lineage2型别也被认为具有更强耐药性、更强毒力的现代型结核菌,这也使得对结核流行病学的、耐药相关的、lineage2型别相关的研究成为关注热点。本实验室通过在中国地区广泛的收集临床耐药结核菌株等,运用高通量二代全基因组测序,对161株结核分枝杆菌进行生物信息分析,从基因组水平挖掘结核分枝杆菌的进化过程、型别特征和耐药相关区域等。本研究的主要结果如下:1.通过与NCBI数据库中22株已知分型的结核分枝杆菌复合群构建系统发育树,确定了收集的161株中国临床菌株分别属于lineage2(122株)、lineage3(2株)和lineage4(37株)。在群体结构方面重点对主要流行的lineage2和lineage4展开分析,从分类比较菌株中突变位点数量发现,结核菌株型别间的差异要比耐药与敏感菌株间的差异要更具有统计学意义,即不同型别间存在明显的突变位点数量的差异。2.基于推测的祖先序列,统计属于lineage2型别相关的位点为295个,lineage4型别相关的突变位点为167个。从G:C>T:A的颠换与G:C>A:T转换的比值中得知,属于lineage2型别相关的SNPs为氧化损伤导致的突变比例较lineage4要大,而氧化损伤又被报道同结核菌的耐药性相关,因此lineage2基因组易发生氧化损伤的突变也为lineage2易突变和易耐药的原因提供条可能线索。另外,我们发现,相对于lineage4型别相关nsSNPs,lineage2型别相关的nsSNPs存在一定数量的影响复制,重组和修复相关的和转录相关蛋白功能的非同义突变(如recD, Rv0922, dinG, uvrC和nth等),这类突变正好也可以用来作为解释lineage2相较于lineage4的易突变和耐多药性的潜在突变机制。因此,这两者型别相关的突变位点除了可作为将来高分辨率分型技术的分子标记,对于研究lineage2的易耐药性、较高突变率和毒力具有一定的价值。3.在依据系统发育树推测结核分枝杆菌的起源和进化史时,我们发现结核分枝杆菌复合群的分化过程同现代人类的迁徙有着密切的联系。追踪结核分枝杆菌复合群的共同祖先,同样是七万年前左右走出非洲,并依此分化非洲分枝lineage5-6(6万年前左右),环印度洋分枝lineage1(5.8万年前左右)、欧美分枝lineage4(3.7万年前左右),并最终3.4万年前左右分化出中亚分枝lineage3和东亚分枝lineage2。4.研究我国主要型别lineage2和lineage4在中国境内的可能历史传播轨迹时,我们推测亚型L2.1、L2.2、L2.3和L4.1、L4.2均是在一万年前左右(新石器时代)发生了扩张和分化出来。其中,L2亚型中相对古老的L2.1很可能同东亚人早期的“南线”从中国的南部地区出现有南至北的传播。而L2.2、L2.3、L4.1和L4.2更可能与东亚人的“北线”迁徙路径有关,从中亚或西伯利亚等区域传入中国并扩散,尤其是最“年轻”的L2.3的共同祖先(5.2千年前-1万年前)很可能由于所处的年代(华夏五千年)或本身的进化优势(更强的毒力、适应性),使L2.3成为如今中国地区的主要流行的型别。5.在对临床结核分枝杆菌的耐药关联分析中,首先根据进化树去除与系统发育树相关的或者是发生同义突变的SNPs,然后通过泊松分布(P<0.05)和正态分布(P<0.01)计算,从剩余的突变基因/间隔区中筛选得到一批多突变位点、耐药菌株中突变比率高的候选耐药相关基因(85个)/间隔区(32个),其中包含已知报道过的14个耐药相关基因和4个耐药相关间隔区。由于临床上有联合用药的习惯,初步统计分析发现该药物与药物之间的关联造成突变频率高的那些已知耐药基因与八种药物都表现出一定关联性,造成关联分析的干扰。6.针对已知报道耐药基因的进一步相关性分析(OR值和Fisher精确检验)发现,生长必需基因突变位点具有一定的偏好性,如rpoB的密码子531和526突变,katG的315突变,这类集中突变的位点很可能是因为对生长必需基因带来的影响小,带给菌株的适应性代价低。相比之下,非必需基因ethA、pncA则表现为耐药菌株中突变的位点较分散、种类多,无明显的特异突变位点;已知报道的inhA和furA基因本身的突变则与异烟肼的耐药性关联性较弱,然而与异烟肼结构上类似的乙硫异烟胺的耐药性则与inhA基因和ethA基因的突变表现出了强关联性;作为同样作用于蛋白质合成的一线药物链霉素和二线药物卷曲霉素、卡那霉素相关的耐药基因则所差异,耐链霉素的菌株以rpsL的突变为主,而编码核糖体16S rRNA的rrs上1400位置的突变则更容易导致卷曲霉素和卡那霉素的耐药性;embA和gyr B则分别在乙胺丁醇和氧氟沙星中耐药性关联远不如embB和gyrA强。7.在挖掘候选耐药相关基因与药物之间的关系中发现,基因水平与耐药机制的关联性要比原本认为的要复杂,基于STITCH数据库分析,除去已知报道的直接药物靶基因突变,还包括间接的耐药补偿相关、细胞壁相关通路(fadD, pks和mmpL家族等)的基因;利用dN/dS分类计算,发现候选的耐药相关基因在耐药菌株中受到强烈的正向选择(平均dN/dS>1.0,其中敏感菌株的全基因组平均dN/dS仅0.430);85个候选耐药基因中含24个结核分枝杆菌生长必需基因,这也很可能与长期的药物刺激和强烈正向选择压力有关;耐药的候选间隔区存在通过sRNA或者新的启动子区域突变来进行调控基因表达的可能。综上所述,本研究通过整合中国地区临床菌株的测序结果,找到lineage2与lineage4型别相关的SNPs和一批耐药候选区域,展示了结核菌同人类长期共进化、迁徙的关系以及在中国的历史传播轨迹,挖掘了一些新结核分枝杆菌的可能耐药机制。本研究为更好预测结核未来的可能传播模式并制定防控、跟踪、诊断和治疗方案提供帮助,同样也为多组学的结核分枝杆菌研究奠定基础。
【Abstract】 Mycobacterium tuberculosis (MTB) is the pathogen of tuberculosis. Since the1950s, effective anti-TB drugs have been developed, so that the tuberculosis epidemic got control to some extent. But because of the neglect of the collaborative TB/HIV activities, misuse of antibiotics and so on, there is the resurgence of TB. China, India and other places have been an increasing number of drug-resistant MTB, and the lineage2of MTB is always considered as a successful population which seem to be more virulence and resistance, and which is popular in China. As a result more and more studies paid attention on the epidemiology of tuberculosis, drug resistance of Mycobacterium tuberculosis and the feature of lineage2. In this study we collected161isolates of clinical resistant MTB in China, and used the next generation sequencing to sequence them. We tried to do the genome-wide analysis of the evolution, the population structure and the drug resistance of M. tuberculosis. The main results of this study were as follows:1. According to the phylogenetic tree of183Mycobacterium tuberculosis complex (MTBC) isolates (161isolates from our data and22well-known isolates from NCBI database), we determined that161clinical isolates were belonged to lineage2(122isolates), lineage3(2isolates) and lineage4(37isolates) respectively. We found statistically significant difference of the number of mutations in the isolates between lineage2and lineage4, but the number of mutations had no significant difference among drug suspective isolates(DS), multidrug resistance isolates and extensively drug resistance isolates.2. Based on the inferred ancestral sequences, we identified295SNPs as lineage2-specific SNPs,167SNPs as lineage4-specific SNPs. The ratio of the tranversion of G:C> T:A to the transition of G:C> A:T shed light on that more mutations in lineage2possible due to oxidative damage. Because the oxidative damage is thought to be associated with the drug resistance in Mycobacterium tuberculosis, this phenomenon of mutation may illustrate the phenotype of lineage2which prefer to be mutation and drug resistance. In addition, according to the gene function enrichment analysis, we also found the transcription-related and the replication, recombination and repair-related COG classifications of lineage4-specific mutated genes were significant low, but these two COG categories of lineage2-specific mutated genes performed as function damage, such as recD, Rv0922, dinG, uvrC and nth. This can explain the reason why lineage2easily mutation being multi-drug resistance. Hence these lineage-specific SNPs not only can be used as molecular markers for future high-resolution genotyping technology, but also be the clue to interpret the different phenotype between lineage2and lineage4.3. Phylogeny and evolution analysis of our isolates help us trace the common ancestor of Mycobacterium tuberculosis complex (MTBC). We believed that the divergence of MTBC were associated with our human migration. As the early human migration, the oldest branch lineage5-6(about60,000years ago) was divided when modern human dispersed from Africa. The Indian ocean rim branch-lineage1were emerged at58,000years ago, then the Europe branch-lineage4turned up about37,000years ago), and finally about34,000years ago MTB divided into Central Asia branch (lineage3) and East Asia branch (lineage2).4. We paid close attention to the history of transmission of lineage2and lineage4in our date, we inferred that the sublineages of L2.1, L2.2, L2.3and L4.1, L4.2all originated and evoluted from one million years ago (around the Neolithic Age). L2.1as a relatively ancient sublineage in lineage2was mainly distributed in southeast of China, therefore L2.1was thought to transmission from south of China by early "southern route" of human migration in East Asia. In contrast, the relatively modern sublineages (L2.2, L2.3L4.1, L4.2) were more likely to associated with the "Northern route" which move from Central Asia or Siberia to China and then spread. Especially the modernest common ancestors of sublineage L2.3(5.2thousand million years ago), probably because they lived in the era of5,000years of Chinese civilization and endowed with evolutionary advantage (higher virulence, adaptability), so L2.3became the prevalent sublineage in China, and even began to spread around the whole world.5. When we processed the drug resistance-related analysis, we discarded phylogenetically related and synonymous SNPs firstly. We finally identified85genes and32intergenic regions (IGRs) were associated with drug resistance which contained a higher density of nonsynonymous SNPs or IGR SNPs (Poisson distribution P<0.05) and was mutated more frequently in drug-resistant isolates than in drug-sensitive ones (Normal distribution quantile P<0.01). These candidate drug resistance-associated regions included some well known drug-related regions (14genes and4IGRs). Because of the habit of combination therapy on the clinical, original statistical analysis showed that the well known regions were calculated associated with eight drugs simultaneously, such as rpoB, rpoC, katG, rpsL, rrs, embB and ethA,.6. After we did the further correlation analysis on well-known regions by using OR values and Fisher exact test, we discovered two main types of mutation. Essential genes of MTB present a bias on mutated sites, such as rpoB had two high rate mutation on codon531and526, codon315of katG showed high rate of genetic mutation. This kind of high mutation probably resulted from their low fitness cost, so that these mutations of essential genes would not harm to themselves too serious. In contrast, nonessential genes, such as ethA and pncA, showed mutations scattered variety, no specific high mutation sites. Reported genes fur A and inhA were weakly correlated with isoniazid; However the ethionamide which had the similar structure with isoniazid was stronger associated with inhA; Although streptomycin (first-line drugs) and capreomycin, kanamycin (second-line drugs) all attact on protein synthesis, streptomycin seem to depend on rpsL mutations and the capreomycin/kanamycin resistance relied on the mutations in1400nt of rrs; The relationship between embA and ethambutol, gyrB and ofloxacin were not as strong as embB and gyrA.7. The relationship between genome-wide mutations and mechanism of drug resistance were more complex than originally thought. According to STITCH database, except many well-known target genes, there were a set of compensatory mutations and cell wall-related pathways (fadD, pks and mmpL family, etc.); We found that candidate drug resistance-related genes were under strongly positive selection. And as a result of the long-term drug pressure, the drug resistance-associated genes identified here likely also contain24essential genes; The mutations of predicted sRNAs and the promoter regions in intergenic regions were possible the potential novel drug resistance mechanism which should deserve greater attention.In conclusion, based on the whole genome sequencing of a population of the clinical Mycobacterium tuberculosis from China, we identified a number of lineage2-specific SNPs, lineage4-specific SNPs and a group of candidate drug resistance-associated regions. The result illustrated the close relationship between transmission of TB and the history of human migration in China, indicated that the genetic basis of drug resistance is more complex than previously anticipated, and provided a strong foundation for elucidating unknown drug resistance mechanisms. Further more, this study may help us to predict future patterns of tuberculosis epidemics and to design rational strategies to diagnosis, treatment and prevention of tuberculosis, which could also promote the research of multi-omic analyses of Mycobacterium tuberculosis.
【Key words】 Genome-wide analysis; Mycobacterium tuberculosis; Single nucleotidepolymorphism; Drug resistance; Origin and evolution;