节点文献

复杂网络的结构刻画与蛋白质作用网络的建模研究

Research on Network Characterzatior and Protein-protein Interaction Network Modeling

【作者】 苏先创

【导师】 杨建刚; 金小刚;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2011, 博士

【摘要】 随着互联网等信息技术的兴起,网络日渐渗入我们生活的方方面面。人们对网络与日俱增的依赖激发了研究者对网络科学这一新兴领域的研究热潮。除了常见的互联网、万维网和电网等人工网络,网络研究的对象还包括神经细胞网络、基因调控网络和蛋白质作用网络等自然网络。这些网络不仅规模巨大、结构复杂,而且还处于无时无刻的变化当中。面对这样一种复杂的对象,两个很自然的问题是:它究竟长什么样?它由是怎么来的?本文围绕着这两个基本问题,试图通过网络刻画方法描绘它的样子,通过建模理解来推测它的形成演化机制。这里先提出一个网络刻画方法,随后将其应用于网络比较和模型评估。以蛋白质作用网络为例,运用多种评估方法总结多个模型的优缺点,通过融合改进来获得更为真实的网络模型,最终推测网络的形成演化机制。本文的主要研究工作和创新点如下:1.提出了一种曲线化的网络刻画方法。该方法通过广度优先遍历获取网络的多尺度结构信息,并将网络表达为可作定量分析的曲线形式。针对是否每个网络都拥有自己独特的曲线这一严格而困难的问题,这里先得到曲线的一个一般表达式,然后将其应用于随机网络和一类小世界网络。针对这两类网络的计算显示,当节点数趋近无穷大时,它们拥有自己独特的曲线,一种网络结构对应于一条网络曲线。接着,通过分析发现,该曲线不仅可以清晰地分辨不同结构的网络,反映甘关网络的多个结构属性,还可以解释网络中的一些特殊现象,例如一个均质的小世界网络会在路由跟踪采样下呈现出一副非均质的无标度表象。2.基于上述的网络曲线提出了一个网络比较方法。该方法将网络曲线视为从大规模网络中提取的结构特征,运用这一特征定义了一个用于度量网络间结构差异的网络距离,并利用该距离进行了网络的比较与分类。网络比较可以帮助一个模型找到最符合实际数据的那组参数,而网络分类可以在一堆模型中为真实网络找到最相符的那个。通过对蛋白质作用网络模型的参数估计与模型评估,验证了该比较与分类方法的有效性和鲁棒性。此外,相比于基于小子图统计的网络比较方法,本文方法不仅更具计算可行性,而且还提供了小子图统计所缺失的结构信息,增强了比较的可靠性。3.提出了一个综合考虑基因复制和边重连机制的蛋白质作用网络模型。在蛋白质作用网络的建模研究中,基因复制与边的重连过程被认为是塑造网络结构的主要机制。虽然这两种机制都非常重要,但限于前期模型评估方法的局限性,致使现有模型或偏重于基因复制,或偏重于边的重连。本文采用多种互补的模型评估方法,比较了三个有竞争力的网络模型,通过数值实验发现,互补变异的基因复制适合于重现果蝇蛋白质作用网络的局部结构,而倾向于连接关键节点的边重连机制则适合于增强网络的整体连通性。在总结它们各自优缺点的基础上,提出了一个综合考虑两种机制的改进模型,并得到了多种评估方法的好评。实验结果表明,在塑造果蝇网络的结构时,基因复制与边的重连机制起着同等重要的作用,弱化其中的任何一个都无法完好的重现果蝇网络。

【Abstract】 With the rise of information technology, networks increasingly permeate many aspects of our daily life. The growing dependence of people on networks has attracted researchers from different disciplines to this emerging field:network science. In addition to the tech-nology networks, such as the Internet, World Wide Web and power grids, the networks studied also include many natural networks, such as the neuron networks, gene regulation networks and protein-protein interaction networks. These networks often involve thousands or millions of vertices and edges, and also keep changing. Faced with such a complex ob-ject, two questions arise:what a network looks like? How it was formed? This article focuses on these two basic questions and devotes to portray the topology of the network and build models to infer its underlying growth mechanisms. Here we first propose a net-work description method, and then apply it to network comparison and model assessment. We investigate the strengths and weaknesses of multiple protein-protein interaction net-work models, and obtain a more realistic model, which help infer the growth and evolution process of the network. The main results of this dissertation are as follows:1. A curve shaped description of large networks is proposed. The description ex-plores multi-scales information of a network by the use of breadth-first traversal. It serves as a bridge linking networks of different structures with a set of particular curves. To in-vestigate the hard problem that whether every network has its own unique curve, here we apply this curve shaped description to both random graphs and a set of small world networks. The analytical results show that each of these networks has a unique curve ex-pression in the limit of large network size. The curve expression possesses a number of network properties in one bag, such as the size of the giant component and the local clus-tering. Interestingly, it shows that a network which is homogeneous and small-world-like appears to have a power-law degree distribution under traceroute sampling.2. A network comparison method is proposed base on the curve mentioned above. This approach takes the curve as a feature extracted from large-scale networks and defines a graph distance to measure the structural differences between networks. The graph distance can then be used in network comparison and classification. Network comparison can help a network model find the parameters that best fit a real data, and the classification can help a real data find the best-fitting model among a pile of candidates. The effectiveness and the robustness of the comparison and classification method are validated through the numerical experiments on parameter estimation and model evaluation for a protein-protein interaction network. Additionally, compared to the network classification method based on subgraph census, the method proposed here is much faster and provides a complementary aspect of the network, which enhances the reliability of the comparison results.3. A comprehensive model considering both gene duplication and link dynamics is proposed for protein-protein interaction networks. In the study of network modeling of protein-protein interaction networks, gene duplication and link dynamics are regarded as the major mechanisms shaping network structure. Although the two mechanisms are very important, the existing models emphasis heavily only on one of the two since the lack of good evaluation methods for network models. Here we use a variety of complemen-tary evaluation methods to assess three competitive network models. The numerical results show that the gene duplication is good that reproducing the local structure of Drosophila’s protein network, and the link dynamics is good at enhancing the overall connectivity of the network. In summing up, a comprehensive model considering both the two mechanisms is proposed and obtains a high rank in the evaluation methods. Experimental results show that both the gene duplication and link dynamics are critical in shaping the topology of Drosophila’s network, weakening any of the two cannot reproduce the intact network.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络