节点文献

不同阶段结直肠癌动态转录组与表达调控网络构建的生物信息学分析

Bioinformatic Analysis of Dynamic Transcriptome and the Construction of Expression Regulation Network Multi-step Colorectal Cancer

【作者】 李夏雨

【导师】 李桂源;

【作者基本信息】 中南大学 , 病理学与病理生理学, 2010, 博士

【摘要】 [结直肠癌转录组学研究背景]结直肠癌(CRC)是常见的消化道恶性肿瘤,其分子发病机制尚未完全阐明。像其他恶性肿瘤一样,CRC的发生发展是一个多步骤多因素参与的复杂过程,是许多微效累加基因和环境因素共同作用的结果,其性状变异呈现连续的数量级差改变。CRC的多阶段演进和多基因参与的发病特点使其成为了众多恶性肿瘤研究中的“模式肿瘤”,CRC的转录组学及其动态变化规律的研究已成为揭示CRC发病学本质和分子生物学机制的重要研究战略。转录组是指某一功能状态下细胞转录本的总和。转录组所涵盖的内容包括mRNA、tRNA、rRNA以及非编码基因的转录产物如microRNA (miRNA)等。与基因组相比,转录组是高度动态的。同一细胞在不同的时期和不同的环境中其基因表达情况具有明显差异。当细胞处在不同的生理或病理变化过程时,相关的基因的转录表达情况会随之发生明显的变化。转录水平的变化是基因组遗传信息的集中体现,同时又是蛋白质翻译影响生物表型形成的前奏。把握了这个高度可变的动态过程,也就把握了有关生命现象本质的那部分关键内容。因此,我们在本研究中提出了动态转录组学的概念,并以此作为本研究的关键科学问题进行了不同阶段CRC动态转录组学与表达调控网络构建的研究。以基因芯片为主要技术手段的转录组学研究方法使研究者能够对各类生物样本的全基因组表达状况进行一次性、高通量检测和评估。采用现代生物信息学方法筛选差异表达基因,进一步结合国际统一的生物在线数据库分析差异基因的功能和所参与的信号通路的方法能够为研究者提供进一步生物学功能实验的理论基础和实验依据。基于以上目的,本课题研究采用最新全基因组芯片(Phalanx One ArrayTM)和microRNA芯片(ExiqonTM miRNA microarray)检测技术,对20例处在不同阶段的CRC (TNMⅠ、Ⅱ、Ⅲ、Ⅳ期)样本(各阶段5例)及其配对正常样本进行高通量的芯片检测,采用国际公认的生物信息学方法结合网络生物学的理论对芯片数据进行了全面、系统的数据挖掘和数据整合,取得了一系列结果。[不同阶段CRC肿瘤与正常配对组织的差异基因分析]采用TwoClassDif (RVM-T test)方法,对4个不同阶段的肿瘤组织和正常组织进行差异筛选,以P Value<0.05且FDR<0.05为标准筛选获得了4组有统计显著性的差异表达基因。其中Ⅰ、Ⅱ、Ⅲ、Ⅳ期对应上调基因分别为698、547、609、713个,下调基因数分别为:140、235、143、158个。分别对4个阶段的差异基因进行GO和Pathway分析。Ⅰ、Ⅱ、Ⅲ、Ⅳ期上调基因显著性GO分别有82、90、106、58项;下调基因显著性GO分别有45、65、45、84项。Ⅰ、Ⅱ、Ⅲ、Ⅳ期上调基因分别参与7、9、19、17条显著性Pathway;下调基因分别参与19、32、16、26条显著性Pathway( Pvalue<0.05且FDR<0.05)。其中Ⅰ期的PPAR信号通路、Ⅱ期的灶性粘附通路、Ⅲ期的细胞因子及其受体交互作用通路和Ⅳ期的Wnt信号通路变化最为显著。基于图论的方法,以Pathway为研究单元,结合KEGG通路间关系数据库将显著性Pathway联系起来,构建4个阶段的Pathway相互作用网络-Path-net。Path-net显示了各阶段显著性通路之间的内在联系,反映了影响各阶段的主要通路。Ⅰ期产生影响的主要通路有MAPK Signaling pathway、Cytokine-cytokine receptor interaction等。Ⅱ期产生影响的主要信号通路有Citrate cycle、Valine,leucine and isoleucine degradation等。Ⅲ期产生影响的主要通路有Cytokine-cytokine receptor interaction、Apoptosis等。Ⅳ期产生影响的主要通路有MAPK signaling pathway、Calcium signaling pathway等。纵观疾病演进各阶段,MAPK signaling pathway、Cytokine-cytokine receptor interaction、Apoptosis等与癌症发生有关的信号通路处于核心枢纽地位,对于CRC的发生发展产生了重要的影响。基于KEGG中有关基因间相互作用关系数据库,结合网络生物学的理论计算并构建差异基因间的信号转导网络Signal-Net.该网络从整体上反映了不同阶段CRC差异表达基因之间的信号转导关系,并由此推导出ITGB1、HSPA9B、PAG等网络中的关键基因。[不同阶段CRC多分组差异基因分析]按照CRC的演进过程的时间序列,以正常、Ⅰ期、Ⅱ期、Ⅲ期和Ⅳ期的基因表达数据为5个时间点分组,采用MultiClassDif多重比较检验(RVM-F test)的方法,以P-value<0.05且FDR<0.05为筛选标准,计算得到一组与阶段相关的多分组差异表达基因2858个。应用Serial Test Cluster(STC)的方法,筛选出影响样本变化的基因动态表达模式62种,并以P<0.05/80为筛选条件,得到20个显著性动态表达模式。其中No.67、No.41和No.80模式具有明显的生物学意义。取显著性动态表达模式所属基因,根据基因的表达相似性,分析基因可能的相互作用关系,以Cluster Coefficient表示基因与相邻基因的密度,以Betweenness Centrality衡量基因的中介能力,构建了随疾病的病程变化的动态基因关系网络-Dynamic-Gene-Net,并由此得到了AFURS1、E2F5、C14ORF104等网络中显现的关键基因。[不同阶段CRC多分组差异miRNA分析]同样按照CRC的演进过程的时间序列,以正常、Ⅰ期、Ⅱ期、Ⅲ期和Ⅳ期的miRNA表达数据为5个时间点分组,采用MulticlassDif多重比较检验(RVM-F test)以P<0.05且FDR<0.05为标准,筛选得到一组与阶段相关的多分组差异表达miRNA55个,其中Targetscan数据库中对应42个。采用Serial Test Cluster (STC)方法筛选出影响样本变化的miRNA表达模式22种,以P<0.05/80为筛选条件,得到2个显著性miRNA动态表达模式:No.18和No.59。[不同阶段CRC差异miRNA和mRNA的整合分析]进一步利用Sanger中的Targetscan的miRNA靶基因预测数据库,对42个差异miRNA进行靶基因预测,筛选出的多分组差异miRNA调控的全体靶基因,共5930个。对5930个多分组差异miRNA靶基因与2858个多分组差异基因求取交集,共得到605个基因即差异miRNA调控的差异靶基因并对此进行GO分析和Pathway分析。以P value<0.01且FDR<0.05筛选得到差异miRNA调控的差异靶基因的显著性GO分类51个,共涉及307个基因。将差异miRNA与显著性GO联系起来构建miRNA-GO-Network反映目标miRNA对基因功能的作用关系。将差异miRNA与显著性GO所属的307个差异靶基因联系起来构建miRNA-Gene-Network反映目标miRNA调控的显著性靶基因调控网络。2个网络同时显示:mir-524-5p、mir-429、mir-340、mir-124等miRNA在网络中处于关键的枢纽地位对605个差异miRNA调控的差异靶基因进行Pathway分析,以p值<0.05且FDR<0.05为筛选标准,得到43个显著性pathway的集合包含167个基因。将显著性pathway构建pathway关系网络path-net,发现产生影响的主要信号通路是MAPK signaling pathway、Focal adhesion、Adherens junction、Wnt signaling pathway、Pathways in cancer等,均为与癌症发生有关的信号通路。基于miRNA对靶基因的负性调控关系,采用负相关分析,对显著性miRNA表达模式No.18、No.59所对应的全部靶基因与CRC演进相关的20种显著性表达模式包含的全部差异基因进行基于芯片表达值的负相关分析。利用miRNA与靶基因之间的负性靶向调控关系构建miRNA-Gene-Network,发现hsa-miR-429、hsa-miR-490-3p、has-mir-18a和has-mir-18b与ONECUT2、KCMF1、TRIM2等多个差异靶基因基因存在负性关系,实时定量PCR结果也验证了mir-18与NAV1和MDGA1存在明显的负相关性,提示miRNA分子及其靶基因在CRC演进过程中显现出的重要转录后调控关系。[总结]对不同阶段CRC演进过程中涉及到的多基因、多通路的变化及其相互作用关系的研究能够系统全面地揭示肿瘤发生发展的动态变化本质。miRNA作为新发现的转录后调控分子,在肿瘤发生、进展过程中扮演重要角色。本研究结合全基因组表达芯片和miRNA表达芯片2种检测方法,既对不同阶段肿瘤与正常配对样本进行了差异基因分析,又按照CRC演进的时间序列分析了多分组差异基因和miRNA的差异表达谱。综合采用生物信息学方法和网络生物学理论系统描述和分析了转录水平的miRNA-mRNA的基因调控关系网络,相关结果对于阐明CRC动态转录组学变化规律具有重要的理论价值和现实意义。

【Abstract】 [Background]Colorectal cancer (CRC) is a common malignant gastroenterological cancer. The molecular mechanism of CRC tumorigenesis has been one of the most important fields in cancer research. The CRC carcinogenesis is a complicated process with poly-genic factors. Its malignant phenotypes expressed in quantitive changes are considered to be related to the accumulation of genetic and environmental alteration. The multi-step and poly-genic characteristics of CRC gradually make it the "tumor model " for fundamental cancer research.Transcriptome is a set of all RNA transcripts including mRNA, tRNA, rRNA and non-coding RNA such as miRNA produced in one or a population of certain type of cell. Unlike genome, which is roughly fixed for a certain type of cell, the transcriptome can vary with external environmental condition and it is considered to be highly dynamic. When the cell suffers different physiologic or pathologic stimuli, its transcriptome will change dramatically. Transcriptomic changes inherit from genomic information and take place before the proteomic level. Understanding of this crucial stage of genomic information process is of most importance for us to reveal the mechanism of life phenomena including tumorigenesis. In our study, we purposed the idea of dynamic transcriptome and put it forward to the study of CRC transcriptomics and establishment of gene expression regulation network.Gene microarray is one the most important tool for transcriptomic study which could simultaneously detect thousands of genes expression. The high-throughput technologies combined with bioinformatics approach would be the best way for screening of the potential differentially expressed genes. The online database with massive gene functional and pathway information will support the further analysis and provide valuable in-silico advice for further research such as biomarker selection and prognosis prediction. Based upon the theory above, in our study, the whole-genome olig-nucleotide gene microarray and miRNA expression microarray were applied to examine the expression of multi-step colorectal cancer and adjacent normal mucosa specimen consisted of 4 TNM stage (with each stage 5 replicates). Further bioinformatics analysis was carried out based on theory of network biology for thoroughly data mining and data integration.[Analysis of differentially expressed genes between tumor and its adjacent normal tissue of different stage]TwoClassDif (RVM-T test) method is applied to analyze the differentially expressed genes in 4 different TNM stage of CRC. With the cutoff of P-value<0.05 and FDR<0.05, we obtained 4 groups of data of significant differentially expressed genes:698,547,609 and 713genes were upregulated whereas 104,235,143 and 158 genes were downregulated in stageⅠ,Ⅱ,ⅢandⅣrespectively.GO analysis showed 82,90,106 and 58 significantly upregulated GO, and 45,65,45and 84 downregulated GO in 4 stages respectively. Pathway analysis showed 7,9,19 and 17 significant pathways of upregulated genes, and 19,32,16 and 26 significant pathway of downregulated genes in 4 stages, respectively (P-value<0.05 and FDR<0.05). The PPAR pathway in stageⅠ, Focal adhesion pathway in stageⅡ, Cytokine-cytokine receptor interaction pathway in stageⅢand Wnt signaling pathway in stageⅣwere significantly altered.According to the graph theory and the relationship provided by KEGG pathway database, we built the path-net showing the interconnection of the pathways of the 4 stages. The main pathways affected in stageⅠwere MAPK signaling pathway, Cytokine-cytokine receptor interaction. The main pathway affected in stageⅡwere citrate cycle, valine leucine and isoleusine degradation. The main pathways affected in StageⅢwere cytokine-cytokine receptor interaction and apoptosis. The main pathways affected in StageⅣwere MAPK signaling pathway, calcium signaling pathway. It is noticed that the MAPK signaling pathway, cytokine-cytokine receptor interaction and apoptosis pathway were of most importance to the progression of CRC.Gene signal transduction network, the signal-net were established based on KEGG database about the interaction between different genes product and theory of network biology. The signal-net referred to the inter-genes signal communication between the differentially expressed genes. The network could provide us with the main effect genes such as ITGB1, HASPA9B, PAG and more. These genes would play important role in different stage of CRC carcinogenesis.[Analysis of multi-Class differentially expressed genes of different stage]The gene expression data of normal, stageⅠ,Ⅱ,ⅢandⅣwere set up as five time point according to the time-series of the CRC progression. MultiClassDif method (RVM-F test) is applied to screen the dynamic differentially expressed genes.2858 multi-class differentially expressed genes were obtained with the cutoff of p<0.05 and FDR<0.05.Serial Test Cluster analysis (STC) is applied to analyze the dynamic gene expression pattern of the multi-class differentially expressed genes. 20 out of 62 patterns were identified as significant expression pattern (p<0.05/80). Among them, pattern No.67, No.41 and No.80 are important to biological understanding.Furthermore, the genes from the 20 significant expression patterns were analyzed according to the similarity of gene expression. "Cluster coefficient" and "betweenness centrality" are the parameters that judge the the gene’s ability of interconnection with others. We build the Dynamic-Gene-net according to these parameters and with calculation we obtained some of the most important genes in the network such as AFURS1, E2F5, C14orf104 and more.[Analysis of Multi-class differentially expressed miRNA of different stages]Similarly we choose the normal, stageⅠ,Ⅱ,Ⅲ,Ⅳas the 5 time point to screen the differentially expressed miRNAs with MultiClassDif method (RVM-F test).55 miRNAs were obtained (p<0.05 and FDR<0.05) and 42 had their records in Targetscan database.STC method was applied to analyze the dynamic expression pattern of miRNAs as well.2 out of 22 patterns were significant (p<0.05/80):the pattern No.18 and No.59.[Integration analysis of miRNA and mRNA of different CRC stages]The Targetscan database of miRNA target gene prediction was applied for the 42 miRNAs target prediction and there were 5930 miRNAs’target genes on total. Intersection of 5930 target genes and 2858 multi-class differentially expressed genes were calculated with 605 genes. These genes were further carried out with GO and pathway analysis.51 significant GO referred to 307 genes were obtained with GO analysis (p<0.01 and FDR<0.05). The miRNAs and GO were connected to build the miRNA-GO-network which showed the target genes functional network according to their relationship to the miRNA. The differentially expressed miRNA and their target genes with significant GO were connected to form the miRNA-Gene-network which showed the relationship of miRNA and their related target genes. Two network concluded that the mir-524-5p, mir-429, mir-340, mir-124 are crucial to the network.Pathway analysis of 605 differentially expressed target genes of miRNAs showed 43 significant pathway referred to 167 genes (p<0.05 and FDR<0.05). Path-net of these significant pathway were build and showed the main effect pathways are MAPK signal pathway, Focal adhesion, adheren junction, Wnt signaling and pathway in cancer. These pathways are reported to be highly related to carcinogenesis.Based on the negative regulation relationship between the miRNA and its target genes, the negative correlation analysis was carried out with 2 significant miRNA expression pattern and 20 significant gene expression patterns. The miRNA-Gene-network were also build and the analysis result showed the expression patterns of mir-429, mir-490-3p, mir-18a and mir-18b were negative correlated with target genes ONECUT2, KCMF1, TRIM2 and so on. Real-time PCR validated the mir-18 negatively regulated the MDGA1 and NAV1 in transcriptional level. These result indicated the potential negatively regulating relationship between these miRNA and their target genes. [Conclusion]Investigation of the poly-genic factors and altereation of multiple pathways involved in the multi-step CRC progression can systematically uncover the dynamic mechanisim of tumorigenesis. The non-coding miRNAs play important role in posttranscriptional regulation during cancer initiation and promotion. We performed the gene and miRNA expression microarray experiment to analyze the two-class difference between cancer and its adjacent normal tissue as well as the multi-class difference of different time point of tumor progression. Bioinformatics with network biology approach comprehensively depict the miRNA-target mRNA regulation network in trancriptomic level. These results are important for us to understand the dynamic transcriptome of CRC and will guide us a way to further investigation of CRC carcinogenesis.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2010年 11期
节点文献中: