节点文献

PAP1和PAP2基因的克隆及其相关生物信息学研究

Identification of PAP1 and PAP2 Gene and Their Correlative Bioinformatics Analysis

【作者】 舒坤贤

【导师】 邬力祥;

【作者基本信息】 中南大学 , 生理学, 2006, 博士

【摘要】 目的:肿瘤抑制蛋白P53是一个通用转录因子,通过激活或抑制其下游基因的表达,在应答诸如癌基因表达、缺氧以及DNA损伤等细胞胁迫信号方面起着关键作用。P53及其下游基因组成了一个复杂的调控网络,了解该调控网络无论对于理解P53的生理功能、肿瘤临床基因治疗或是药物发现等都具有十分重大的意义。而了解P53调控网络的关键是鉴定p53下游基因。本研究首先利用分子生物学的方法克隆新的p53下游基因,并对其功能进行初步研究;其次利用生物信息学方法对整个人类基因组DNA中存在的p53下游基因进行预测分析;从而进一步完善p53基因调控网络。方法:利用哺乳动物基因诱导表达系统,Tet-OnTM基因表达系统,以人脑胶质瘤细胞株U251为实验材料,建立p53基因可诱导表达的转p53基因细胞系,并构建p53基因过度表达的cDNA文库。通过差异显示、测序、同源性比较及cDNA文库筛选等方法克隆新的p53下游基因。对新克隆的p53下游基因利用生物信息学方法进行结构与功能预测,通过凝胶滞留实验研究新克隆的p53下游基因调控序列与P53蛋白结合状态,并利用Northern blot、原位杂交等分子生物学实验技术研究克隆的基因在小鼠胚胎发育过程中表达规律。其次,收集已报道的p53下游基因及P53蛋白结合序列,通过统计分析,了解这些调控序列的特征信息,得到保守性一致性序列的特征,并对E1-Deiry等定义的一致性序列特征信息进行修改;利用PWM模型、词频法、串模型及E1-Deiry等定义的一致性序列中的插入序列长度等计算序列的信息特征,利用logistic回归分析方法建立p53下游基因预测新的模型。运用该模型对人类基因组DNA中p53下游基因进行预测,根据GO(Gene Ontology)功能分类标准,对预测的结果进行分类,并与利用保守性一致性序列及一致性序列预测的结果进行比较。结果:主要包括以下五个方面:一、建立了p53基因可诱导表达的转p53基因细胞系,命名为U251-pTet-p53。该细胞系在强力霉素诱导下,外源性p53基因过度表达,在没有强力霉素的培养基条件下,外源性p53基因几乎不表达。差异显示结果表明:外源性p53基因过度表达,能引起细胞内许多基因的差异表达,有的基因表达上调,有的基因表达下调。所有这些差异表达的基因都有可能是p53下游基因。对观察到的有差异表达的11个EST进行测序,其中2个代表未报道的新基因。二、建立了p53基因过度表达时的cDNA文库。并对第一部分差异显示获得的两个新的EST,进一步通过cDNA文库筛选获得全序列,分别命名为PAP1(p53 activated protein 1)(GenBank收录号:AF497245)和PAP2(p53 activated protein 2)(GenBank收录号:AY093673)。三、PAP1基因的结构与功能:1、PAP1基因的生物信息学分析表明:(1)、PAP1基因定位于人类染色体16p12-13,整个基因由6个外显子和5个内含子组成;(2)、PAP1基因启动子和前3个内含子中含有许多P53蛋白结合位点;(3)、PAP1基因cDNA全长2779bp,开放阅读框起始第282 nt,终止位点第1130nt,全长846bp。预测其编码蛋白分子量为32.9KD,理论等电点pI为5.81,化学方程式为C1505H2309N385O421S11。(4)、PAP1蛋白的二级结构:40%为α螺旋,17%为β折叠,43%为其它类型的二级结构。PAP1蛋白为亲水性蛋白,存在一个跨膜区,大约在42—79氨基酸片段,没有信号肽。(5)、PAP1基因属免疫球蛋白超家族(IGSF)成员,与黑猩猩、狗、小鼠、鸡、牛等物种具有高度同源性,在进化过程中十分保守。2、分子生物学实验结果表明:(1)、内含子2中的P53蛋白结合位点,GAGCTTGTCCcccGAtCAAGCCC,能与P53蛋白结合,说明PAP1基因是p53下游基因;(2)、PCNA免疫组织化学和细胞凋亡检测结果表明:小鼠胚胎发育的第9—10天主要以细胞增殖为主的时期;胚胎第11—14天是细胞增殖和凋亡的逐渐趋于平衡的阶段;不同组织的发育进程不同。(3)、Northern blot结果表明PAP1基因(实际上是PAP1在小鼠中的同源基因IGSF6)在小鼠胚胎不同的发育时期表达有差异。(4)、原位杂交显示:PAP1基因(实际上是PAP1在小鼠中的同源基因IGSF6)在第11—14天中,肺、肾、肠及脊柱组织中特异性表达,说明PAP1基因参与了这些主要器官的发育过程,通过与发育过程的细胞增殖与凋亡趋势比较,该基因很可能与胚胎发育过程中的细胞凋亡有关。四、PAP2基因的生物信息学分析表明:(1)、PAP2基因定位于人类17号染色体上;(2)、mRNA全长2007bp转录调控区域起始位167bp处,启动子序列在反链1998-1748处,开放阅读范952bp-1461bp,全长510bp;(3)、它编码蛋白全长169aa,分子量为19247.3,理论等电点为12.56,化学式为C818H1355N317O208S9。没有发现信号肽和跨膜螺旋结构,属于亲水性,非分泌性蛋白;(4)、PAP2蛋白亚细胞定位在核内;(5)、PAP2蛋白二级结构:α螺旋20.71%,β折叠4.14%,其他75.15%。五、本研究共收集已报道的49个p53下游基因及72条P53蛋白结合序列。1、统计分析结果:(1)、基本上与E1-Deiry等定义的一致性保守序列特征吻合,但在十聚体的绝大多数位点都存在错配,错配率在10-20%;(2)、对于整个十聚体,错配数为3的基因占34.4%,错配数为4的基因占12.7%,错配数为5的基因占6.35%。因此,我们认为用一致性序列模型预测p53下游基因时,整个十聚体的允许的错配数为4比较恰当;(3)、在一致性序列中,插入的碱基数与错配数呈正相关。2、建立logistic回归模结果如下:(1)、采用两个PWM矩阵来分别对前后十聚体建模,并采用交叉验证法确定已报道的结合序列中的模体,将确定位置的模体特征信息作为logistic回归分析的对象,通过SPSS提供的logistic回归分析模型对特征逐步选取,最终确定以前后十聚体的PWM得分作为特征信息建立了logistic回归模型:p=(exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))╱(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))阈值设为0.1076,其中hpwmsc,tpwmsc分别表示motif的前后十聚体中PWM模型得分。(2)、用已报道的P53结合序列作为正数据集,随机挑选的CDS序列作为负数据集,并对正数据集和负数据集进行刀切法测试验证了方法的有效性,平均正确率达到了93.91%。(3)、利用我们总结的保守性一致性序列模型、修正后一致性序列模型及建立的logistic回归模型,采用Perl语言编写程序,对人类基因组数据中P53结合位点进行分析比较,表明logistic回归模型的识别性能更加优异,而且该模型还具有良好的可扩展性,能够方便地容纳新特征,使识别性能不断提高。3、对人类基因组DNA进行p53下游基因预测分析结果:(1)、利用保守性一致性序列预测到p53下游基因1693个;(2)、利用允许错配数为4的一致性序列(串模型)预测到p53下游基因22107个;(3)、利用logistic回归模型预测到p53下游基因15182个;4、基于GO对p53下游基因进行功能分类结果:(1)、细胞组分:p53下游基因主要的功能集中在细胞、细胞器及蛋白复合物等几个区域。(2)、分子功能:p53下游基因功能主要有结合、催化活性、酶调节活性、信号转导活性、结构分子行为、转译调节活性、运输行为和未知分子功能等几个方面。而在转译调节活性、运输行为和未知分子功能等功能区域中有非常多p53下游基因还没有被发现。(3)、生物过程:p53下游基因参与的生物过程主要包括细过程胞内、、生理过程、生物学过程调节、刺激应答等,在发育、未知的生物学过程等有非常多的p53下游基因还没有被发现。结论:主要包括如下:(1)建立了p53基因可诱导表达的转p53基因细胞系,命名为U251-pTet-p53。该细胞系中外源性p53基因可以被强力霉素诱导过度表达。(2)构建了p53基因过度表达时的cDNA文库。(3)PAP1基因是新克隆的p53下游基因,定位于人类染色体16p12-13,由6个外显子和5个内含子组成。PAP1基因编码的蛋白属免疫球蛋白超家族(IGSF)成员,在进化过程中十分保守。PAP1基因在小鼠胚胎发育过程中,肺、肾、肠及脊柱组织中有特异性表达,很可能与这些器官发育过程中的细胞凋亡有关。(4)PAP2基因是新克隆的p53下游基因,定位于人类17号染色体上,其编码的蛋白在进化过程中十分保守;(5)对已报道的p53下游基因分析表明,用一致性序列模型预测p53下游基因时,整个十聚体允许的错配数为4比较恰当;(6)建立了预测p53下游基因的logistic回归模型:p=(exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))╱(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))阈值设为0.1076,其中hpwmsc,tpwmsc分别表示motif(decamers)的前后十聚体PWM中模型得分。利用该模型在人类基因组中预测到15182个p53下游基因。

【Abstract】 [Objective] Tumor suppressor p53 is a transcription factor that playsa critical role in coordinating the response of cells to a diverse range ofstress conditions, e.g. oncogenic activation, hypoxia or DNA damage,which can mediate its different downstream functions by activating orrepressing a large number of target genes. P53 and its downstream genesconsist of a complicated gene network. It is very important to understandthe p53 gene regulatory network in order to know the p53 physiologicalfunctions, medicament discovery and gene therapy in cancers. Theultimate challenge to define the complete p53 gene regulatory network isto identify p53 downstream genes. To identify novel p53 downstreamgenes and explain their functions by molecular approaches, to predict p53downstream genes in the whole human genomic DNA by bioinformaticsmethods in order to study the p53 gene regulatory network further.[Methods] We established a new system of p53 gene inducibleexpressions, with the Tet-OnTM Gene Expression System, in whichexogenous p53 gene could overexpress in doxycycline (Dox) medium butnot in the medium without Dox. And constituted a cDNA library whilep53 gene overexpressed. Gained the novel p53 downstream genes byDD-PCR, sequencing, BLASTn in GenBank and screening the cDNAlibrary. Predicted the structures and functions of the novel genes bybioinformatics analysis and knowed their expression characterizations inmouse embryonic development by northern blot and in situ hybridizationapproaches.Then, collected the p53 downstream genes and the binding DNAsequences for wild-type P53 protein published in PubMed. Statisticalanalysis of the characteristics of the consensus sequences. A model forprediction of p53 downstream genes based on logistic regression analysiswas proposed, with which the candidate features of primary sequence arecalculated by selecting proper models including PWM model, frequencydistribution model, consensus sequence model and the length of insertsequence in the motif. We predicted the p53 downstream genes in human genomic DNA by the conservative consensus binding sequence, theconsensus binding sequence, and the logistic regression analysis model,then classified them according to GO (Gene Ontology).[Results] These results were divided into five parts:Ⅰ. We established a new system of p53 gene inducible expression,named U251-pTet-p53 cell line, with the Tet-OnTM Gene ExpressionSystem, in which exogenous p53 gene could overexpress in doxycycline(Dox) medium but not in the medium without Dox. By comparing theirrandom primer RT-PCR products, it was proved that exogenous p53 geneexpression could lead to many genes differential expression, someup-expressed and others down-expressed. All of these differentialexpressed genes may be p53 downstream genes. Sequenced the 11 EST ofdifferential expressed genes observed, 2 of them not reported.Ⅱ. We constructed the p53 overexpressed cDNA library andscreened the two novel genes complete nucleotide sequences, namedPAP1 (p53 activated protein 1, GenBank number: AF497245) and PAP2(p53 activated protein 2, GenBank number: AY093673) respectively.Ⅲ. The structure and function of PAP1 as follow:1. The results of PAP1 gene bioinformatics analysis:(1). PAP1 gene has been localized the human chromosome 16p12-13,with six exons and five introns.(2). There are many p53 binding sites in PAP1 gene promoter and1-3 introns.(3). The complete nucleotide sequence of PAP1 cDNA has 2779 bpand contains a long open reading frame of 849 bp that starts at the firstmethioine codon (nt 282) and ends with the stop codon TAA (nt 1130).The predicted protein sequencederived from the open reading frameproduces a 282-amino acid polypeptide, with a calculated molecular massof 32.9 kD and a theoretical isoelectric point of 5.81. The molecularformula is C1505H2309N385O421S11.(4). The secondary structure of PAP1 protein can be classified as:40%of alpha-helix, 17%of beta-pleated sheet and 43%of others. PAP1protein is hydropathicity protein, and no signal peptide was found.(5). PAP1 gene is a novel member of the immunoglobulin superfamily (IGSF). Alignment of the predicted protein sequence forHuman, Pan troglodytes, Canis, Mus musculus and Gallus gallus revealedit was highly conserved.2. The results of the molecular experiment:(1). There is a p53 binding site, GAGCTTGTCCcccGAtCAAGCCC,in intron 2 of PAP1 gene indicated it is a p53 downstream gene.(2). The results of immunohistochemistry and TUNEL techniquesshowed From 9-10-dpc was the phase of primitive organ formation inembryo development. It was observed that the cell proliferation wasdominant, apoptosis was scarce, 11-14-dpc was the the phase ofmaintainnent balance by the proliferation and apoptosis.(3). PAP1 gene (in fact is its homologue, IGSF6gene) possibleinvolves in mouse embryonic development. The presence of IGSF6specific transcript was detected by Northern blot in the RNAs extractedfrom 11-14 day-postconception. PAP1 expression is different in mouseembryos of the different ages.(4). In situ hybridization performed on mice embryos sections in11-14 dpc showed the differential presence of PAP1 (in fact is itshomologue, IGSF6gene) in developing lung, kidney, intestine andvertebral column and indicated that PAP1 possible involved in mouseembryonic development. By comparing it with the proliferation andapoptosis in the developing cells suggests a function involvement inembryonic development, perhaps involvement in cell apoptosis.Ⅳ. The results of PAP2 gene bioinformatics analysis as follow:1. PAP2 gene has been localized in the human chromosome 17.2. The complete nucleotide sequence of PAP2 cDNA has 2007 bpand contains a long open reading frame of 510 bp that starts at the firstmethioine codon (nt 952) and ends with the stop codon TGA (nt 1461).3. The predicted protein sequence derived from the open readingframe produces a 169-amino acid polypeptide, with a calculatedmolecular mass of 19.2 kD and a theoretical isoelectric point of 12.56.The molecular formula is C818H1355N317O208S9. NO signal peptide wasfound, it might be non-secretory protein.4. The PAP2 protein has been localized in nucleus. 5. The secondary structure of PAP2 protein can classified as: 20.71%of alpha-helix, 4.14%of beta-pleated sheet and 75.15%of others.Ⅴ. Total 49 of p53 downstream genes and 72 of human DNAbinding sequences for wild-type p53 published in PubMed was collected.1. The results of statistical analysis as follow:(1). It’s consistent with the consensus binding sequence for wild-typep53 that El-Deiry, et al defined, but there are mismatch distribution inmost of position in decamers and the numbers of mismatch are 10-20%.(2). In all decamers, the number of three mismatches is 34.4%, fourmismatches is 12.7%and five mismatches is 6.35%. These data show thatthe criterion for computer analysis of p53 downstream genes allows atleast four mismatches.2. The results of establishment of the model of logistic regressionanalysis as follow:(1). Two PWM matrices were adopted to modeling the two decamersrespectively, and a cross validate method was used to affirm the motif inevery known binding sequence. Then those motifs’ features wereconsidered as the objects of the logistic regression analysis. A model forprediction of p53 downstream genes based on logistic regression analysiswas proposed, according to the optimal features including the twodecamers’ PWM score are determined from candidate feature sets througha stepwise selection process offered by SPSS. The model is:p=exp(-4.655+0.457×hpwmsc+0.421×tpwmsc)/(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))The region is p>or=0.1076, and hpwmsc, tpwmsc stands for thescore of PWM of head decamer, tail decamer in the motif, respectively.(2). The DNA binding sequences for wild-type p53 published inPubMed was regarded as positive dataset and human gene CDSsequences picked random as negative dataset. The model was trained andtested on the selected positive and negative datasets by the jackknifemethod, and the average prediction accuracy is 93.91%.(3). Analyzed the p53 downstream genes in the human genome usingthe prediction model and computer Perl language, and compared with theresult of consensus sequence model, the results indicated that our model was a universal algorithm that outperformed the traditionary consensussequence model, furthermore the framework of the model is extendable,which could accept more new fratures to improve the efficiency ofprediction results.3. The results of prediction of p53 downstream genes in humangenomic DNA as follows:(1). There are 1693 of p53 downstream genes by the conservativeconsensus binding sequence.(2). There are 22107 of p53 downstream genes by the consensusbinding sequence (allows four mismatches).(3). There are 15182 of p53 downstream genes by the logisticregression analysis model.4. The results of the classification of p53 downstream genesaccording to GO as follows:(1). Cellular Component: mainly including cell, organelle and proteincomplex.(2). Molecular Function: mainly including binding, catalytic activity,enzyme regulator activity, signal transducer activity, structural moleculeactivity, transcription regulator activity, transporter activity and obsoletemolecular function. There are a lot of p53 downstream genes which arenot identified now in the groups of transcription regulator activity,transporter activity and obsolete molecular function.(3). Biological process: mainly including cellular process,physiological process, regulation of biological process, response tostimulus. There are a lot of p53 downstream genes which are notidentified now in the groups of development and obsolete biologicalprocess.[Conclusion] The conclusion mainly including:1. We have established a new system of p53 gene inducibleexpression, named U251-pTet-p53 cell line, in which exogenous p53gene could overexpress in doxycycline (Dox) medium but not in themedium without Dox.2. Constructed cDNA library in whichp53 gene overexpressed.3. PAP1 gene is a novel p53 downstream gene which has been localized the human chromosome 16p12-13, with six exons and fiveintrons. The predicted PAP1 protein is a novel member of theimmunoglobulin superfamily (IGSF), which is highly conserved. Thedifferential presence of PAP1 in developing lung, kidney, intestine andvertebral column indicated that PAP1 possible involved in mouseembryonic development, perhaps involvement in cell apoptosis.4. PAP2 gene is a novel p53 downstream gene which has beenlocalized the human chromosome 17. The predicted PAP1 protein ishighly conserved.5. The results of statistical analysis show that the criterion forcomputer analysis of p53 downstream genes allows at least fourmismatches.6. A model for prediction of p53 downstream genes based on logisticregression analysis was proposed:p=exp(-4.655+0.457×hpwmsc+0.421×tpwmsc)/(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))The region is p>or=0.1076, and hpwmsc, tpwmsc stands for thescore of PWM of head decamer, tail decamer in the motif, respectively.15182 of p53 downstream genes have identified by this model.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2008年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络