节点文献

结直肠癌电子表达谱的生物信息学分析及差异表达基因研究

A Systemic Analysis of Electronic Gene Expression Profile and Identification of Differentially Expressed Genes in Colorectal Cancer

【作者】 吕炳建

【导师】 来茂德;

【作者基本信息】 浙江大学 , 病理学与病理生理学, 2007, 博士

【摘要】 结直肠癌是严重危害我国人民健康的常见病,其发病率近年上升趋势明显。尽管结直肠癌发生发展的分子机制取得一定进展,仍有许多已知或未知的基因有待于进一步探索。随着人类基因组计划、单体型计划和癌症基因组解剖计划等大规模测序项目的完成,公共数据库积累了大量的基因组序列信息。表达序列标签(expressed sequence tag,EST)是cDNA单次测序结果,是寻找基因的重要方法。挖掘公共EST数据库(database of EST,dbEST),将为我们的研究带来新思路。本文从dbEST获取结直肠正常、炎症性肠病、腺瘤和腺癌等相关EST,利用dbEST和UniGene数据库的序列注释特征,开发生物信息学软件GetUni,成功实现了从EST到UniGene的聚类,构建结直肠正常(N)、炎症性肠病(IBD)、腺瘤(A)和腺癌(T)四个电子文库,各文库分别包含4375、3451、875和200608个UniGene,4108、2230、606和18891个非冗余UniGene,及4108、2201、592和14879个基因。美国癌症研究所(National Cancer Institute,NCI)结直肠正常及癌组织cDNA文库Xprofiler交叉验证GetUni的转化效率。电子文库间的直接比对以及基于Gene Ontology的生物信息学分析发现:除bA9F11.1(Hs.329040)外,T文库包括了N、IBD和A等3个文库的全部基因。T文库每个基因平均含1.27个转录子,显著高于其它三个文库(p<0.01)。生物信息学分析发现50条信号通路在文库间富集程度差异有统计学显著性(p<0.01),核糖体蛋白基因与糖酵解/糖异生通路基因在文库A和IBD相对富集,整合素信号通路基因在N和IBD相对富集,七次跨膜受体(rhodopsin family)家族则在T文库相对富集。Q-PCR检测结果发现,RPS2、RPS12、RPS27a、RPL7a、RPL5和RPL10等6个核糖体蛋白基因分别有5例、6例、3例、5例、2例和3例腺瘤表达相对正常表达增高,21例、18例、17例、14例、21例和13例结直肠癌表达相对增高。但各基因在正常、腺瘤和腺癌等不同组织之间的表达水平差异未发现有显著统计学意义(p>0.05)。系统聚类把腺瘤、结直肠癌分成核糖体蛋白基因富集程度较高的A组和相对较低的B组,各有25例(腺瘤7例)和23例(腺瘤1例)结直肠肿瘤,提示腺瘤核糖体蛋白基因富集程度显著高于结直肠癌(87.5%,7/8例腺瘤vs 45%,18/40例结直肠癌),差异有统计学意义(p=0.020)。在结直肠组织电子表达谱生物信息学分析的基础上,选取95个基因构建整合TaqMan定量PCR和微流体技术的低密度芯片(Low density array,LDA),进行高通量定量PCR研究。这些基因大多与结直肠正常粘膜上皮分化关系密切,包括Wnt、TGFβ/BMP4、Hedgehog和Notch等四条信号通路、Polycomb Group(PcG)基因家族及其它细胞分化发育相关基因。96个基因(含GAPDH)中,仅DAAM1未见有效扩增信号。26例结直肠癌配对组织(包括27个癌、7个腺瘤及配对正常组织)LDA检测总体阳性率为97.2%(11520/11844)。同一组织不同LDAGAPDH检测Ct值之差最大为0.952,最小0.001,平均Ct值之差在0.279±0.254。Spearman相关分析和Logistic回归分析发现两次实验检测结果高度相关(p<0.0001,r=0.911,r2=0.829)。94个基因在不同实验之间的ΔCt值差异,94个基因两次实验间的ΔCt值差平均为0.36±0.33。Spearman相关分析和Logistic回归分析发现各基因两次实验间ΔCt差平均值与基因平均ΔCt值呈高度线性正相关(p<0.0001,r=0.743,r2=0.551)。配对正常和腺癌组织之间LDA鉴定了79个差异表达基因(p<0.05),癌组织表达上调4个,下调75个,2倍以上34个。其中,Wnt信号通路基因NKD1和SOX9在结直肠癌组织表达显著上调,分别较正常上调15.15倍和1.62倍。APC、DAAM2、Tcf4等基因表达显著下调,分别较正常下调2.71倍、3.81倍和2.18倍。与正常组织相比,TGFβ/BMP4信号通路基因TGFβ1、SMAD4、L3MBTL等在结直肠癌组织分别下调2.41倍、2.51倍和1.39倍,HH信号通路基因IHH、DISP1、DISP2等在结直肠癌组织分别下调2.03倍、1.79倍和13.96倍,Notch信号通路基因NOTCH2、MAML1、MAL2、MAML3、HES1、HES2等基因在结直肠癌分别下调2.22倍、1.91倍、2.39倍、1.96倍、1.76倍、1.76倍,PcG基因家族EZH2在结直肠癌组织上调1.64倍,EPC1、EPC2、PCGF1、PCGF2、PCGF3、PCGF4、PCGF5、PCGF6等分别下调2.28倍、1.96倍、1.93倍、2.02倍、1.63倍、1.49倍、2.28倍和1.58倍。7例正常、腺瘤及腺癌配对组织LDA检测发现59个差异表达基因(p<0.05)。在结直肠腺瘤,上述各基因的表达趋势基本上与腺癌一致。SYBR Q-PCR检测了19个基因在22例正常、腺癌配对结直肠组织的表达,发现SOX9、EPC1、EPC2,CECR1、KLF9、METRNL、NKD1、NUMB、SPRED2、DISP2等10个基因在结直肠癌组织的表达改变有统计学显著意义(p<0.05)。全部19个基因SYBR Q-PCR与LDA的表达趋势(2-ΔΔCt值)一致。19个基因的中位表达变化倍数相关分析发现,两种方法检测结果高度相关(p=0.003,r=0.79,r2=0.637)。最后,我们对结直肠癌表达上调基因SOX9进行了比较全面的临床病理研究。20例结直肠癌及6个细胞系DNA测序未能发现SOX9基因编码区存在突变。21例结直肠正常、腺癌及5个伴发腺瘤配对组织Western-blot检测发现,SOX9、β-Catenin分别在16例和15例腺癌表达上调,各有5个和2个腺瘤表达上调。结直肠癌SOX9、β-catenin蛋白表达上调有显著统计学意义(Mean±SD:SOX9正常0.3293±0.3863,腺癌0.907±0.6413,p<0.01;β-catenin正常0.3397±0.2921,腺癌0.6024±0.498,p<0.05)。免疫组织化学染色发现,正常粘膜SOX9+细胞主要位于隐窝基底部,正常粘膜下部阳性细胞数(9.6±13.1%)显著高于上部(3.1±5.9%)(p<0.001),与Ki67染色的分布特征吻合。在结直肠腺瘤,SOX9+细胞沿隐窝弥漫分布,但腺瘤底部阳性细胞数要高于上部(50.7%±25.1%vs 32.9%±30.0%,p<0.001)。结直肠腺瘤SOX9+细胞(39.8%±30.9%)要显著高于瘤旁残留黏膜(24.9%±18.2%)。结直肠癌SOX9总体表达率为36.7%±30.3%,显著高于正常粘膜和瘤旁残留粘膜(p<0.001),但与腺瘤表达差异无统计学意义(p>0.05)。结直肠腺瘤SOX9阳性率为74.5%(70/94),结直肠癌阳性率为60.1%(113/188),腺瘤与癌的阳性率显著高于正常粘膜(10/110,9.09%)(p<0.001)。如把“-”和“+”定义为SOX9低表达,“++”和“+++”合并为高表达,我们发现SOX9高表达分别见于50例(53.2%)结直肠腺瘤和64例(34.0%)结直肠癌,正常粘膜未见有SOX9高表达。结直肠腺瘤和癌组织SOX9高表达率显著高于正常粘膜(p<0.001)。SOX9高表达在黏液腺癌/印戒细胞癌相对非黏液/印戒细胞癌少见(p<0.05,x2检验)。SOX9高表达的结直肠癌五年生存率为39.5%(17/43),显著低于SOX9低表达的结直肠癌69.5%(66/95)(p<0.01)。多因素COX分析发现。SOX9高表达是结直肠癌预后不良的独立指标(p<0.05,RR=1.381,95%CI:1.051-1.815)。通过上述研究,我们得出以下结论:①结直肠癌在分子水平呈高度异质性,变异性剪接在结直肠癌发生过程中有重要作用,核糖体蛋白基因富集是结直肠腺瘤与炎症性肠病等癌前病变的重要分子特征。因此,结直肠电子文库的构建有助于揭示结直肠癌发生、发展过程中的总体分子特征,对于加深我们对结直肠癌发生机制的理解有重要意义。②结直肠上皮细胞分化关系密切的Wnt信号通路激活及TGFβ/BMP、HH、Notch等信号通路抑制及PcG家族表达改变在结直肠癌的发生过程中有重要作用,且这些信号通路之间存在复杂的相互作用网络。③Wnt信号通路下游基因SOX9与结直肠上皮细胞分化有关,其表达在结直肠腺瘤及腺癌表达增高,SOX9高表达是结直肠癌预后不良的独立评判指标。④LDA是一种灵敏、可靠的高通量定量PCR检测方法,dbEST生物信息学分析结合LDA检测将为恶性肿瘤生物标志物的筛选提供新思路、新方法。

【Abstract】 Colorectal cancer is one of the commonest cancers in China. The incidence of colorectal cancer is increasing rapidly in the last two decades. The molecular mechanisms underlying colorectal carcinogenesis have achieved great advance recently, but many novel genes which are associated with cancer initiation and progression remain to be explored. Astronomic data have been accumulating in the public database with the accomplishment of large-scale sequencing projects, such as human genome project, finished human genome project, haplotype project (HapMap) and cancer genome anatomy project (CGAP), etc. Expressed sequence tag (EST), a single-passed sequence of compliment DNA (cDNA), is one of the most important ways to find novel biomarkers for cancer. Mining database of EST (dbEST) from the Genobank will shed new light on cancer study.In this manuscript, we downloaded ESTs relevant to colorectal normal mucosa, inflammatory bowel disease, adenoma and cancer from dbEST. We developed a bioinformatics software package-GetUni by utilizing the sequence annotation features in the dbEST and UniGene. We then successfully clustered all these ESTs into UniGene, thus constructed 4 electronic gene expression libraries, normal mucosa (N), inflammatory bowel disease (IBD), adenoma (A) and cancer (T). There are 4,375, 3,451, 875 and 200,608 UniGenes, 4,108,2,230, 606 and 18,891 non-redundant UniGenes, or 4,108,2,230, 592 and 14,879 genes in library N, IBD, A and T, respectively. The cDNA Xprofiler analysis of colorectal normal mucosa and cancer in the National Cancer Institute (NCI) has cross-validated the efficiency of our GetUni software package. Subsequently, overall features for these libraries were analyzed by GOTM (GOTree Machine). Genes in library N, IBD and A were all found in library T only except bA9F11.1 (Hs.329040), which was only present in library A. Each gene in library T had an average of 1.27 transcripts, significantly higher than that in other libraries (p<0.01, x2 test). Differences among these libraries in gene enrichment were statistically significant in 50 signaling pathways as revealed by WebGestalt (p<0.01), such as enrichment of ribosome protein genes or genes of KEGG Glycolysis/Gluconeogenesis pathway in library A and IBD, Integrin pathway genes in IBD and N and 7 transmembrane receptor (rhodopsin family) genes in cancers. Quantitative PCR found elevated expression of RPS2, RPS12, RPS27a, RPL7a, RPL5 and RPL10 in 5,6, 3, 5, 2 and 3 of 8 adenomas, or 21, 18, 17, 14, 21 and 13 of 40 colorectal cancers, respectively. However, the expression of these ribosomal protein genes among normal mucosa, adenoma and cancer was not statistically significant (p>0.05). Hierarchial analysis showed 2 distinct groups of colorectal adenomas and cancers. There are 7 adenomas and 35 cancers in the group with high ribosomal protein gene enrichment and 1 adenoma and 23 cancers in the group with low ribosomal protein gene enrichment. The enrichment of ribosomal protein genes was significantly more common in colorectal adenomas (7/8, 87.5%) than that in cancers (18/40,45%) (p=0.020).Next, we selected 95 genes from electronic gene expression profile in colorectal cancer to construct a low-density array (LDA), a high throughput PCR approach based on the combination of TaqMan quantitative PCR and microfluidic principle. All these candidate genes were focused on cell differentiation and development including 4 signaling pathway (Wnt, TGFβ/BMP4, Hedgehog and Notch), Polycomb Group (PcG) family and others. GAPDH was used as the internal control for normalization. We only failed to detect DAAM1 among all 96 genes by using LDA approach. Totally, 97.2% sample gene tests [11,520/11,844 (384×96)] were successfully amplified. To evaluate the reproductivity of LDA, we compared the threshold cycles (Ct) of GAPDH between different experiments. We found that the range of Ct variation between different times for a specific sample is 0.001~0.952, with a mean±SD of 0.279±0.254. Spearman correlation and Logistic regression analysis showed a significant correlation between different tests in all 60 samples (p<0.0001, r=0.911,r2=0.829). We next analyzed the ΔCt variations between tests in the 94 target genes. The average ΔCt variation was 0.36±0.33. The ΔCt variation was significantly correlated with the mean of ΔCt (p<0.0002, r=0.743,r2=0.551).By using LDA, we identified 79 differentially expressed genes between colorectal cancers and normal mucosa (p<0.05), of which, 4 were upregulated and 75 downregulated as compared with those in the normal mucosa. Thirty-four genes had a change-fold more than 2. In Wnt pathway, NKD1 and SOX9 were significantly upregulated with medium change-folds of 15.15 and 1.62, respectively, and APC, DAAM2 and Tcf4 downregulated 2.71, 2.82 and 2.18 folds in colorectal cancers as compared with those in normal mucosa, respectively (p<0.05). In comparison with those in normal tissue, genes in TGFβ7BMP4 pathway, Hedgehog pathway and Notch pathway, were significantly downregulated in colorectal cancers with medium change-folds of TGFβ1 2.41, SMAD4 2.51, L3MBTL 1.39, IHH 2.03, DISP1 1.79, DISP2 13.96, NOTCH2 2.22, MAML1 1.91, MAML2 2.39, MAML3 1.96, HES1 1.76 and HES2 1.78, respectively (p<0.05). In PcG gene family, only EZH2 showed higher expression level in colorectal cancers with a medium fold of 1.64 and others had lower expression with medium change-folds of EPC1 2.28, EPC2 1.96, PCGF1 1.93, PCGF2 2.02, PCGF3 1.63, PCGF4 1.49, PCGF5 2.28 and PCGF6 1.58. In 7 normal, adenoma and cancer individual-matched cases, 59 differentially expressed genes were found (p<0.05). Most of these genes showed consistent expression alterations in adenomas as in cancers. We applied SYBR Green Q-PCR to detect 19 differentially expressed genes, which were identified by LDA, in 22 normal and cancer individual-matched colorectal patients. A total of 10 genes, SOX9, EPC1, EPC2, CECR1, KLF9, METRNL, NKD1, NUMB, SPRED2 and DISP2, were significantly differentially expressed in colorectal cancers (p<0.05). As expected, all of the 19 genes had the same directional changes (2-ΔΔCt) between SYBR Green approach and LDA in colorectal cancers. The medium change-folds of these genes were significantly correlated between these two approaches (p=0.003, r=0.79, r2=0.637).Finally, we carried out a large clinicopathological survey on one of the upregulated genes in colorectal caner-SOX9, a novel downstream transcriptor in the Wnt pathway. There were no significant mutations in the coding region of SOX9 in 20 colorectal cancers and 5 colonic cancer cell lines. Western-blot showed SOX9 and β-Catenin upregulation in 16 or 15 of 21 colorectal cancers and 5 or 2 of 5 adenomas, respectively. SOX9 and β-Catenin were significantly upregulated in cancers as compared with those in normalmucosa (Mean±SD: SOX9 0.3293±0.3863 in normal mucosa, 907±0.6413 in cancer,p<0.01;β-Catenin 0.3397±0.2921 in normal mucosa, 0.6024±0.498 in cancer, p<0.05). SOX9+ cells were invariably located in the lower part of normal colonic mucosa, showing a characteristic pattern of Ki67 staining. There were more SOX9+ cells in the lower zone (9.6±13.1%) of colonic mucosa than those in the upper zone (3.1 ±5.9%) (p<0.001). In colorectal adenomas, SOX9+ cells could be seen along the whole dysplastic crypt. However, the lower zone contained more SOX9+ cells than that in the upper zone (50.7%±25.1% vs 32.9%±30.0%, p<0.001). There were more SOX9+ cells in colorectal adenomas (39.8%±30.9%) than that in the peri-adenomatous normal mucosa (24.9%±18.2%) (p<0.01). The rate of SOX9+ cells in cancer is 36.7%±30.3%, significantly higher than that in normal mucosa and peri-carcinomatous normal residues, but not in adenomas (p>0.05). The SOX9+ incidence of both adenomas (74.5%, 70/94) and cancers (60.1%, 113/188) was higher than that in the normal mucosa (10/110,9.09%) (p<0.001), if a working protocol of asimple additive scoring system for immunostaining assessment was applied. As we defined "-" and "+" as "low-expression" and "++" and "+++" as "overexpression", we found that SOX9 overexpression is significantly more common in colorectal adenomas (53.2%, 50/94) and cancers (34.0%, 64/188) than that in the normal mucosa, where none showed SOX9 overexpression. SOX9 overexpression was less common in mucin-producing cancer (signet-ring cell cancer and mutinous adenocarcinoma) than that in non-mucin-producing cancer (p<0.05). The 5-year overall survival was significantly lower in colorectal cancer patients with SOX9 overexpression (39.5%, 17/43) than those with low-expression (69.5%, 66/95) (p<0.01). Survival analysis and COX proportional haphazard model indicated that SOX9 overexpression is an independent adverse prognosticator in colorectal cancers (p<0.05,RR=1.381, 95% CI:1.051-1.815).In conclusion, we demonstrated that colorectal caner is heterogenous at the molecular level. Alternative splicing is common in cancers implying its potential role in carcinogenesis. Enrichment of ribosomal protein genes is an important molecular feature for two most important colorectal precursor lesions, adenoma and inflammatory bowel disease. Thus, the electronic gene expression library will help us to explore the underlying general molecular features in the process of colorectal cancer initiation and progression. We also suggested that dysregulation of colonocyte differentiation associated pathways and gene families, such as Wnt activation, inhibition of TGFβ/BMP4, Hedgehog and Notch, and aberrant expression of PcG family genes, had potential roles in the colorectal carcinogenesis. Wnt pathway might play a central role in the intricate network of these pathways. Particularly, we found that SOX9, a novel transcriptor in Wnt pathway, was associated with intestinal epithelial differentiation. SOX9 was overexpressed in colorectal adenomas and cancers and its overexpression was an adverse prognosticator for colorectal cancer, thus it might serve as a potential gene therapeutic target in the future. Finally, we demonstrated that LDA was a sensitive and reliable high throughput quantitative PCR method. A combined application of dbEST data mining and LDA was a novel methodology for cancer biomarkers detection.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2007年 03期
节点文献中: