节点文献

基于贝叶斯方法的基因调控网络构建

Construction of Gene Regulatory Networks Based on Bayesian Approaches

【作者】 刘攀

【导师】 邓伟;

【作者基本信息】 苏州大学 , 计算机应用技术, 2013, 硕士

【摘要】 随着基因芯片技术的发展,产生的海量基因表达数据与一定计算方法的结合可以重构基因调控网络。目前已有许多模型应用于基因调控网络的构建,其中贝叶斯网络模型以其坚实的理论基础、知识结构的自然表述、灵活的推理能力以及高效的决策机制使其应用范围越来越广泛,已成为构建基因调控网络的一种有力工具。使用贝叶斯方法构建基因调控网络已经确立了许多研究方向,如以信息论为基础的约束性方法,在基因表达数据中融合先验知识,无标尺网络的研究等。其中,使用互信息理论构建基因调控网络可以考虑其它基因对此基因的影响,但它只提供基因的功能特性而没用提供基因间的因果关系;在基因中融合先验知识可以克服基因的稀疏问题但是缺少在基因时序数据集中融合先验知识的实验比对,所以无法获得对先验知识错误的敏感度信息。本文在总结分析贝叶斯方法构建基因调控网络研究现状的基础上,对以上问题进行了改进,主要完成了以下工作:1.在基于条件互信息的路径一致性算法PCA-CMI的基础上,利用节点拓扑排序建立了构建调控网络的PCA-CMI-NO算法。为了建立这一算法对图分裂方法加以改进:首先对基因对间的互信息进行筛选,然后按贝叶斯得分对子图排序,根据子图顺序选取不同子图中含相同基因对间边的方向,从而确定基因表达数据中节点的顺序。最后,将节点拓扑排序结果应用于PCA-CMI所构建的网络,获得有向网络,同时使用条件互信息去除独立关系的边以提高网络准确率;2.利用吉布斯分布方法的能量函数融合一源生物先验知识,并将其拓展到多源生物先验知识的融合上面,并用不同可信度指标来减小先验知识与数据不一致的影响,最后分别使用MCMC算法与爬山算法在时序表达数据上构建不同生物源的基因调控网络,获得了对先验知识错误的敏感度信息;3.第一种方法在DREAM3的10基因和50基因酵母(yeast)上进行实验,第二种方法使用KEGG数据库中选取的14个基因的调控网络(包括3个转录因子),一组先验知识为Lee提出的实验数据和另外一组先验知识是Harbison提出的实验数据进行实验,分别实现了基于贝叶斯方法的基因调控网络构建实验系统,从而验证了方法的有效性。

【Abstract】 With the development of gene chip technology, the way that massive gene expressiondata are combined with certain calculation methods can result in the construction of a generegulatory network. There are many models used in the construction of gene regulatorynetworks, Bayesian network model with its solid theoretical foundation, naturalrepresentation of the knowledge structure, flexible reasoning ability and convenientdecision-making mechanism makes it’s application range more widely, becoming apowerful tool for building gene regulatory networks.Using Bayesian methods to reconstruct gene regulatory networks has establishedmany research directions, such as information theory-based constraint method, priorknowledge integration, and large scale free network research and so on. Mutualinformation theory to construct the gene regulatory network can consider the impact of theother genes to this gene, but it only provides the function features of genes, can’t offercausal relationships between genes; prior knowledge integration can overcome the sparseproblem of a gene network but lack the experiments in time-series expression data, makingit impossible to obtain error sensitivity information on prior knowledge.This article summarized the research status of Bayesian approaches to construct thegene regulatory network, and made some improvements on this thesis, the followingspecific research work was completed:1. Combined the node ordering with path consistency algorithm based on conditionalmutual information, solving the problem that the network has no causal directions. Toachieve this purpose, we made some improvements on the graph splitting method: firstfiltered mutual information between a pair of nodes, then arranged substructures indescending order of Bayesian scores, and finally according to the arrangement chose the orientation of the edge between the same gene pair included in the different substructures;2. Used Gibbs distribution method to integrate with the one source and multi-sourcebiological prior knowledge respectively, and applied different confidence indicators toreduce the impact of inconsistencies between prior knowledge and data, and finally appliedthe MCMC algorithm and hill-climbing algorithm in the time-series expression data tobuild the gene regulatory networks to verify its effectiveness;3. The first method used10and50yeast genes in DREAM3respectively; the secondmethod selected the14genes (including3transcription factors) from the KEGG database,and a set of prior knowledge applied the data which Lee had proposed, and another set ofprior knowledge used the data Harbison had proposed; then a gene regulatory constructionexperiment system was built, which verified the effectiveness of this two methods.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2013年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络