节点文献

miRNA及转录因子结合位点预测与调控功能分析

Research on Bidning Site Prediction and Regulatory Function Analysis of Mirna and Transcription Factor

【作者】 汪国华

【导师】 王晓龙; 王亚东;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2009, 博士

【摘要】 随着包括人类在内的上百种生物的基因组核酸全序列测定的完成,生物学家们正在制定并实施后基因组计划。基因组学研究重心已开始从揭示生命的所有遗传信息转移到在分子整体水平对功能的研究上,其中,理解基因转录调控与转录后调控机制是后基因组时代的一个基本目标。近年来,作为基因转录调控与转录后调控过程中的重要作用因子,转录因子与miRNA成为生物信息学的重要研究领域。其中,miRNA曾在2002和2003连续两年被Science杂志评选为十大科技新闻。越来越多的生物信息学研究人员致力于研究转录因子与miRNA的生物功能与调控机制。但目前的方法局限于单独研究转录因子或miRNA的调控功能,割离了转录因子与miRNA对基因表达的共同作用。因此,本文以基因表达作为切入点,研究转录因子与miRNA调控模型,预测它们的结合位点与调控功能,以及识别miRNA启动子区域。本文的主要内容包括:(1)提出转录因子与miRNA共调控基因表达的结合位点预测算法。本文对传统的利用芯片数据研究转录调控的计算方法进行了分析,并作了新的扩展,充分考虑转录因子与miRNA对基因表达的共同作用,设计并实现了基于基因表达芯片数据的转录因子及miRNA结合位点预测算法。算法通过测试5′端调控区与3′端非翻译区域中固定长度的所有调控序列以选择能够最切合基因表达水平的序列,预测转录因子及miRNA的结合位点。在小鼠胚胎酒精综合症细胞中使用该算法预测得到的转录因子与miRNA结合位点具有生物学意义,验证了算法的有效性。(2)研究基于结合位点信息的转录因子与miRNA调控功能分析模型。转录因子结合位点和miRNA靶基因预测一直是生物学研究的热点,已经有很多成熟的数据库和软件。本文讨论了利用相关生物学知识定位转录因子及miRNA结合位点的方法,结合已知转录因子模体及miRNA与靶基因相互作用的知识,设计了基于结合位点信息的转录因子及miRNA调控功能分析方法。该方法将结合位点的定位整合到相关的功能数据之中,允许从调控全局基因表达模式的角度上,对引起基因表达差异的转录因子和miRNA进行分析。使用该方法在前列腺癌细胞中预测出导致前列腺癌恶化的5个功能转录因子与7个miRNA,并通过各种生物知识验证了预测结果的正确性。(3)提出基于CHIP-SEQ数据的miRNA启动子计算识别算法。miRNA启动子识别是研究miRNA转录调控的一个难点问题。传统的方法使用基因组特征预测miRNA启动子。随着新一代测序技术的出现,CHIP-SEQ数据为启动子预测提供了新的数据支持,开辟了新的研究方向。本文利用RNA聚合酶Ⅱ的CHIP-SEQ数据,研究基于CHIP-SEQ数据的启动子区域表示模型,设计模式参数学习算法,并利用蛋白质编码基因的启动子上的数据优化模型参数,在miRNA上游区域预测启动子。算法被用在乳腺癌细胞的RNA聚合酶ⅡCHIP-SEQ数据中,预测出72个miRNA启动子,并分析了启动子区域的基因组特征。(4)针对多种芯片数据,分析并讨论了多数据融合的转录因子与miRNA调控功能分析方法。随着各种高通量数据的出现,采用多数据融合的方法分析基因转录调控机制变得越来越重要。本文从一个具体的生物实例出发,利用基因表达芯片、miRNA表达芯片、CHIP-CHIP实验数据分析了转录因子与miRNA的调控功能。多数据融合的分析方法为调控网络的构建与系统生物学的发展提供了有力的支持。

【Abstract】 With the completion of the full length genome sequences determination of human and hundreds of other creatures, biologists are making plan of and carrying out the post genome project. The research focus is turned to working on the functions at molecular level from revealing creatures’genetic information. And understanding global transcriptional and post-transcriptional regulatory mechanisms is a fundamental goal of the post-genomic era.In recent years, as an important factor of the global transcriptional and post-transcriptional regulation, research of transcription factors and miRNA is becoming an important part in bioinformatics. In 2002 and 2003, miRNA was selected as "Ten news of science" by Science Press. More and more researchers of bioinformatics are focusing on the functions and regulatory mechanisms of transcription factors and miRNA. Conventional methods are confined to work on the regulatory functions of transcription factors and miRNA separately, which ignore their interaction on gene expression. Therefore gene expression is taken as a cut-in-point of researches on developing regulatory model of transcription factors and miRNA, predicting the binding sites and regulatory functions, and identifying the promoter region of miRNA. The major research contents of this thesis include the following four parts:(1)A binding sites predicting algorithm based on interactive regulating on gene expression of transcription factors and miRNA is proposed.Conventional computational methods using microarray data to investigate transcriptional regulation is analyzed and extended. With adequate consideration of the combinatorial regulation of transcription factors and miRNA on gene expression, a novel algorithm for identifying potential transcription factor and miRNA binding sites from microarray-derived gene expression data and genomic DNA sequences is proposed. The algorithm identifys potential transcription factor and miRNA binding sites by testing random subsets of all possible motifs of a fixed size in the 5’-regulatory region and 3’untranslated regions, and selecting those motifs that best fit a combinatorial model of gene expression levels. The predicted transcription factor and miRNA binding sites in mouse infetal alcohol syndrome has a biological significance, and the effectivity of the algorithm is validated.(2)Research on transcription factor and miRNA regulatory function analysis model based on binding sites information. Prediction of transcription factor binding sites and targets of miRNAs are always the hot research topics, and many mature databases and softwares have been developed. Identification of transcription factors and miRNAs binding sites based on relevant biologic knowledge is discussed. With the knowledge about reported transcription factors motif and the interaction of miRNA and its targets, the locating of the binding sites is interacted into relevant functional data. From the point of regulating the global gene expression patterns, the transcription factors and the miRNAs which cause the difference of gene expression are analyzed. With this method,5 transcription factors and 7 miRNAs which cause the growth of prostate carcinoma are predicted in the prostate carcinoma cell line. The correctness of the predicted results was verified by various biological knowledges.(3) A method using ChIP-seq derived data to identify promoter regions of miRNAs is presented.Identification of miRNA promoter regions is one of the difficulties of research on miRNA transcription regulation. Conventional methods predict miRNA promoter regions with features of genomes. With the advent of next generation sequencing, a an new data support for promoter regions prediction is provided by ChIP-Seq data, and a new way is thus opened up. Using ChIP-seq derived RNA PolⅡbinding data, a model for identifying miRNA promoter regions is presented. A parameter learning algorithms is developed for parameter optimization with protein-coding genes’ promoter regions data, and the promoter regions are prediction in the upstream regions of miRNAs. Through this method,72 miRNAs promoter regions were detected with RNA PolymeraseⅡChIP-Seq data of breast cancer cells, and the genomes features of promoter regions are also analyzed.(4) Analysis and discussion on the regulatory function analyzing method of transcription factors and miRNAs based on multi-data fusion.With the advent of high-throughput data, analyzing the regulation mechanisms of gene transcription with multi-data fusion method becomes more and more important. Based on a concrete example, the regulatory functions of transcription factors and miRNAs are analyzed with experiments data of gene expression chip-seq, miRNA chip-seq, and ChIP-chip data. Method of multi-data fusion provides forceful support to systems biology and the development of gene regulatory networks.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络