节点文献

副粘病毒Tianjin株全基因组测序及生物信息学分析

Sequencing and Bioinformatics Analysis of the Complete Genome of Paramyxovirus Tianjin Strain

【作者】 石立莹

【导师】 李晓眠; 李梅;

【作者基本信息】 天津医科大学 , 病原生物学, 2007, 博士

【摘要】 副粘病毒为有包膜的负链RNA病毒,是一类重要的人类和动物致病病毒,如尼帕病毒(Nipah virus)、亨德拉病毒(Hendra virus)、Menangle病毒等。1999年我们从因呼吸道感染而死亡的灵长类动物狨猴肺组织中分离到一株病毒,经过一系列研究初步证实该病毒可能是一株副粘病毒,暂时命名为副粘病毒Tianjin株。在病原鉴定过程中,我们曾经通过RT-PCR方法扩增得到该毒株一段长为375 bp的片段,并将测序结果发送到互联网BLAST服务器进行联配检索,发现其与仙台病毒HN基因同源性较高,但推导出的氨基酸序列还存在较大差异。更重要的是,仙台病毒是啮齿类动物致死性病原体,然而在普通棉耳狨猴呼吸道疾病流行期间,饲养在同一实验中心的各种实验用鼠无一例发病。而且血清学调查结果显示,动物中心工作人员都有此病毒的抗体;正常人群中对此病毒的抗体阳性率高达35%;患急性呼吸道感染的儿童抗体阳性率为19.28%,说明此病毒与人类关系密切,可能是仙台病毒发生了某些变异而跨越了种属界限,形成对啮齿类动物低致病性,而对灵长类动物乃至人类高致病性的毒株;或者是一株未被发现的能引起灵长类动物及人类呼吸道感染的新的副粘病毒。为进一步明确其来源和种系进化地位,阐明与人类疾病间的关系,我们从分子水平上对该病毒进行了较为深入的研究。首先,在第一部分工作中,我们在引物步移测序策略指导下对副粘病毒Tianjin株全基因组进行了测序。根据GenBank中已发表的仙台病毒全基因组序列,设计了13对引物,以鸡胚尿囊液中提取出的病毒总RNA为模板,扩增得到13个覆盖整个基因组的相互重叠片断,同时采用3’RACE和5’RACE法对Tianjin株基因组末端序列进行扩增,经克隆、测序、拼接,最后得到了Tianjin株的全基因组序列。结果表明,副粘病毒Tianjin株基因组由15,384个核苷酸组成,与已知仙台病毒基因组长度完全相同,遵循“六碱基原则”。从基因组的3’端至5’端依次排列着3’UTR,结构基因NP、P、M、F、HN和L编码区,以及5’UTR。P基因内同样包含了另外一重叠的开放阅读框,且存在着一个保守的RNA编辑位点(RNA editing site)3’-UUUUUCCC-5’。推测除了编码NP、P、M、F、HN和L六种病毒特异性结构蛋白外,另外还可能编码C、V、W等非结构蛋白。Tianjin株各结构基因的上下游均存在着保守的基因起始信号S和终止信号E,连接E和S的是一个保守的三核苷酸间隔区GAA或GGG。但是与已知仙台病毒相比,Tianjin株基因组结构中有两处明显不同,一是在HN基因和L基因间保守的S序列中有一个碱基发生突变(A8532G);二是L基因终止密码子突变(A15240C),导致预测的Tianjin株L蛋白比已知仙台病毒L蛋白多了一个氨基酸。比较巧合的是,仙台病毒BB1株也在该点出现同样的突变。这些突变有可能导致Tianjin株出现一些独特的生物学特性。在本文第二部分工作中,我们对测序得到的Tianjin株全基因组序列进行了较为全面的生物信息学分析。为了确定Tianjin株在副粘病毒科中的分类地位,我们将Tianjin株与副粘病毒科25株病毒全基因组序列进行了系统进化分析,进化树显示,Tianjin株与仙台病毒、人副流感病毒1型、人副流感病毒3型和牛副流感病毒3型位于同一个分支,且与仙台病毒亲缘关系最接近,说明Tianjin株在分类上属于副粘病毒科副粘病毒亚科呼吸道病毒属。进一步与GenBank中14株仙台病毒全基因组序列比较,进化树显示,Tianjin株不属于现有的三个分支,而是独立为一个分支,同源性分析同样支持这一结果,因此Tianjin株应被列为仙台病毒第四种基因型。另外,结构基因推导出的氨基酸序列同源性比较显示,P蛋白变异最大,虽然只有568 aa,却含有29个独特变异位点和35个与BB1共同具有的变异位点,与已知仙台病毒同源性仅为78.7%~91.9%。比较其他仙台病毒,也具有这样的特点,说明P蛋白有可能是仙台病毒的特异性抗原。六个蛋白中,同源性最高的是L蛋白,为96.0%~98.0%,其次是M蛋白,为93.1%~97.1%,说明二者较保守。氨基酸序列进化树分析显示,六种结构蛋白的进化树与基因组序列的进化树基本一致,仅有细微的不同,其中变化较大的是M蛋白。在M蛋白进化树中,Tianjin株与Ohita株和Hamamatsu株代表的分支最接近,同源性也最高,为97.1%,与BB1株同源性为96.6%。而在其他蛋白进化树中,Tianjin株均与BB1株亲缘关系最接近,同源性也最高。基因组末端序列分析表明,Tianjin株3’和5’末端分别含有55 nt和57 nt的前导区,且高度保守,与已知仙台病毒同源性分别为89.1%~98.2%和93.0%~96.5%。而且3’端最前12个核苷酸和5’端最后12个核苷酸互补配对,这些特点均与病毒复制和转录调控功能密切相关。除此以外,我们还分析了Tianjin株基因组序列中核苷酸变异位点的分布,以及预测的结构蛋白中氨基酸变异位点的意义。Tianjin株基因组全序列中共存在444个独特的和546个与BB1株共有的核苷酸变异位点,其中嘌呤-嘧啶之间的颠换突变分别占9.7%(43/444)和10.1%(55/546)。这些变异位点主要位于各个结构蛋白编码区,从而导致Tianjin株结构蛋白中存在着大量独特的氨基酸变异位点。这些变异位点的存在提示Tianjin株有可能在生物学特性、致病性、免疫原性、流行病学等方面与其他仙台病毒株具有很大的区别。关于这些变异位点与病毒生物学特性乃至致病性之间的关系,还有待将来采用反向遗传学技术构建重组病毒来证实。综上所述,本研究完成了副粘病毒Tianjin株基因组全长的序列测定,并且得到了可以长期保存的覆盖病毒基因组全长的cDNA分段克隆,这些克隆为病毒基因功能研究提供了材料,同时也为构建副粘病毒Tianjin株基因组全长cDNA克隆奠定了基础。另外,我们通过生物信息学方法对Tianjin株的分类地位、与已知仙台病毒的关系、以及各结构蛋白中变异位点的意义进行了探讨,希望为以后进一步深入研究该病毒的具体致病机制奠定一定的基础。

【Abstract】 Members of the family Paramyxoviridae are pleomorphic enveloped virusespossessing a nonsegmented negative-strand RNA genome, such as Nipah virus,Hendra virus, and Menangle virus, which can cause fatal disease outbreaks in bothanimals and humans. In 1999, a strain of virus was isolated from the lungs ofcommon cotton-eared marmoset that died during an outbreak of disease in an animallaboratory. Virological and morphological analysis and sequence determination ofpart of the HN gene indicated that the virus responsible for the outbreak belonged tothe family Paramyxoviridae, designated temporarily paramyxovirus Tianjin strain.In our previous work, a fragment of 375 nt was amplified from viral RNA byRT-PCR and sequenced, then sequence similarity searches were conducted using theBLAST service at the National Center for Biotechnology Information (NCBI). Theresults showed that the fragment had higher homology with partial HN gene ofSendai viruses. But the deduced protein sequence showed high divergence amongthem. More importantly, Sendai virus usually causes outbreaks of lethal pneumonia inmouse colonies, whereas the experimental mice had not been suffering fromrespiratory disease during the epidemic in the same animal laboratory. In addition,ELISA tests demonstrated that the faculty in the animal laboratory had antibodyspecific to this virus, and 14 of 40 people who had never contacted with themarmosets had antibody to the virus. We also detected the sera of young childrenwith acute respiratory tract infection, and found that the positive rate of IgM to thevirus was 19.28%. These results suggested that the virus had a close relation tohuman, and may be a common respiratory virus in human and marmoset. In order tounderstand the genomic structure and taxonomic position of the strain, as well as therelationship with Sendai viruses, we determined the complete genome sequence ofTianjin strain and compared it with those of other paramyxoviruses withbioinformatics methods. In the first work of our study, the complete genome sequence of theparamyxovirus Tianjin strain had been determined. A total of 13 overlapping cDNAclones, covering the entire genome of Tianjin strain, were obtained by primer walkingRT-PCR. The sequences of the 3’ and 5’ termini of the viral genome were amplifiedby 3’ and 5’ RACE. Sequences compiled from these clones show that Tianjin strain is15,384 nt long. The genome size is identical to that of Sendai virus. The number15,384 is a multiple of 6, therefore, Tianjin strain conforms to the ’rule of six’, whichplays an important role in the replication of paramyxoviruses. The genome of Tianjinstrain consists of six genes in the order 3’-NP-P/C-M-F-HN-L-5’, coding for thenucleocapsid (NP), phosphoprotein (P), matrix (M), fusion (F), hemagglutinin-neuraminidase (HN), and large (L) proteins, respectively. The P gene containsanother ORF, which is predicted to encode C protein. A putative RNA editing site isfound 942 nt downstream of the initiation codon (ATG) of the P protein (nt2785-2793). Sequence of the editing site of Tianjin strain, 3’-UUUUUCCC-5’,conform well to the conserved sequence, UU(U/C)UCCC, found in all members ofthe subfamily Paramyxovirinae. The gene junctions contain highly conservedtranscription start and stop signal sequences and trinucleotide intergenic regionssimilar to those of Sendai viruses. The genomic structure of Tianjin strain divergesslightly from other Sendai viruses. Two significant nucleotide substitutions are found.One is in the gene start signal sequence between the HN and L ORFs and the other isin the stop codon of L protein, which results in the extended L gene mRNA. Theseunique substitutions indicate that Tianjin strain maybe has some novel features.In the second work, we analyzed the complete genome sequence and thededuced protein sequence of Tianjin strain with bioinformatics methods. Phylogeneticanalysis of genome sequences among Tianjin strain, and other members of the familyParamyxoviridae demonstrate that Tianjin strain, SeV, hPIV-1, hPIV-3 and bPIV-3are in one lineage, and Tianjin strain is most closely akin to Sendai virus. This resultsuggests that Tianjin strain should be assigned to the genus respirovirus within the subfamily Paramyxovirinae and is most likely a new genotype of Sendai virus.Consequently, the complete genome sequence of Tianjin strain was compared withthose of Sendai viruses from GenBank. Phylogenetic tree shows that Tianjin strainlocates a new branch and is more closely related to BB1 strain with 94.9% nucleotidehomology, which was submitted by Institute of Viral Disease Control and Prevention,Beijing in China.Sequence comparisons based on the predicted protein sequences indicate that Lprotein is the most conserved, having 96.0%~98.0% amino acid identity with otherSeVs. As the reported for other paramyxovirus P proteins, Tianjin strain P protein ispoorly conserved, sharing only 78.7%~91.9% amino acid identity with the knownSendai viruses. In addition, there are 29 unique substitutions and 35 commonsubstitutions with BB1 strain in P protein sequence of Tianjin strain. These resultsindicate P protein is maybe the specific protein of Sendai viruses. All of the treesbased on protein sequences are similar to that of complete genome sequence exceptM protein. In M phylogenetic tree, Tianjin strain is more closely related to the branchrepresented by Ohita and Hamamatsu strains than to BB1 strain.The 3’ and 5’ ends of Tianjin strain genome comprise the leader and trailerregions. The leader sequence is 55 nt long and the trailer sequence is 57 nt long. Theyare exactly complementary for the first 12 nt, which is similar to known Sendaiviruses. The nucleotide sequence comparisons among Tianjin strain and Sendaiviruses showed that the homologies of the leader and the trailer were 89.1%~98.2%,93.0%~96.5%, respectively. The result suggests that the leader and the trailersequences are highly conserved among Sendai viruses, which had been proved to beassociated with transcription and replication of the viral genome.In addition, we analyzed the distribution of the mutations in the genomesequence of Tianjin strain and the significance of amino acid substitutions. Tianjinstrain possesses 444 unique nucleotide variations and 546 common nucleotidevariations with BB1 strain in the entire genome sequence. These variations mainly distribute in the coding region of predicted protein, and result in a lot of amino acidsubstitutions in the protein sequences, which suggests that Tianjin strain maybe has asignificant difference in biological, pathological, immunological, or epidemiologicalcharacteristics from other Sendai viruses. To prove the relationship between thesubstitutions and viral biological properties or pathogenicity, generation ofrecombinant SeVs carrying the mutation by reverse genetic technology would benecessary.In our study, we showed the complete genome sequence of Tianjin strain anddemonstrated the taxonomic position of Tianjin strain and the relationship withSendai viruses, as well as the significance of amino acid substitutions in the predictedprotein sequences. These results might lay the foundation for the further research ofTianjin strain.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络