节点文献

复杂网络系统间相似性识别及其应用

Identification of Similarity between Complex Network Systems and Its Application

【作者】 杜方

【导师】 吴铁军;

【作者基本信息】 浙江大学 , 控制科学与工程, 2010, 博士

【摘要】 将复杂系统描述为由相互作用的个体组成的复杂网络,是理解复杂系统的重要方法。在短短的十年间,复杂网络研究已扩展到信息、控制、物理、生物和社会科学等领域。随着大量系统被描述成复杂网络,以复杂网络系统为对象的网络数据挖掘应运而生,复杂网络系统间的相似性识别问题就是其中一项全新的课题。该课题包括系统间个体相似性识别和系统整体相似性分析两部分内容。其中,个体相似性识别旨在挖掘出关联系统间个体的未知对应关系,它可以归纳为一个通用形式的复杂网络间节点匹配问题,是课题的关键所在。研究复杂网络系统间的相似性识别问题对大规模复杂关联系统的分析与优化设计、多源信息系统集成、跨网络信息搜索、同源蛋白质发现、多语言自动翻译等具有积极的现实意义和理论价值。本文针对复杂网络系统间的相似性识别问题进行了较为系统的研究。全文按建模、求解和应用的主线展开。其中,复杂网络间节点匹配问题的求解部分按节点对应关系(一对一、一对多)和参与匹配的网络数量(两网络、多网络)两个维度上将研究递进展开。本文主要包括如下内容:1.给出了两种关联网络模型,即同源演化模型和协同演化模型。两者均具有可调的相似度和网络结构,前者可自然地扩展为多网络关联模型。为了保证网络的连通性,提出了一种网络连通性检测方法,大大减少了计算量。将单网络节点相似度计算推广到多网络,提出了三种网络间节点相似度函数,并给出了相似度函数的区分度定义,用于定量评估相似度函数的表现优劣。这部分研究一方面为本文的网络间相似性识别问题提供了仿真平台和算法设计依据,另一方面,对如何将其它网络数据挖掘算法推广到多网络也具有重要参考意义。2.针对“已匹配节点对”的不足和节点相似度函数的局部性限制,提出了一种基于相似度传播的节点匹配算法,适合于求解小规模两网络一对一节点匹配问题,但对“已匹配节点对”的分布情况无特殊要求。所引入的节点相似度传播过程使少量的初始节点相似度能够按网络拓扑结构传播到全局,从而能够充分利用“已匹配节点对”信息,特别是在“已匹配节点对”比例较小的情况下,使得匹配算法仍能获得相对满意的匹配精度。3.针对全局节点匹配算法的计算瓶颈提出了一种近似线性的迭代节点匹配算法,适合于求解现实世界大规模两网络一对一节点匹配问题。考虑到该迭代算法一定程度上依赖于“已匹配节点对”的集中分布,提出了一种集中大度值优先策略来优化选择“已匹配节点对”,从而有利于迭代算法前期表现,进而获得较高的匹配精度。实验统计表明,在“已匹配节点对”相对集中的情况下,该迭代节点匹配算法兼具高时效和高精度。4.给出了一对多节点匹配问题的数学描述,并提出了两种一对多节点匹配算法,即基于局部映射的匹配算法和基于集成的匹配算法。前者一定程度上能克服一对一迭代节点匹配算法的“短视”缺点,后者类似于Bagging集成技术具有好的抗噪能力并易于分布式并行化计算。为了集成的需要,提出了一种随机一对一节点匹配算法。以一个实际大规模即时通讯系统(阿里旺旺)为数据来源,提取了两个真实的关联网络,并在其之上进行了迭代一对一节点匹配算法和一对多节点匹配算法的实证研究。研究表明,所提的一对多节点匹配算法能有效地将匹配目标定位到一个较小的范围。5.给出了多网络节点匹配问题的数学描述,并提出了一种基于团簇提取的多网络节点匹配算法。该算法能同时考虑多个网络的结构信息和所有网络间的“已匹配节点对”信息。以四种不同结构的多网络节点匹配实验为例,对所提算法进行了测试和验证,并与参考算法进行了对比,实验统计结果表明了所提算法的有效性。6.从系统相似性的角度出发,分析了系统相似性对复杂关联系统连锁故障效应的影响。分析结果显示,关联系统的相似性有利于避免连锁故障的发生。据此规律,提出了一种基于节点匹配的连锁故障防护方法,该方法在不改变系统结构的约束下,优化关联系统中可调个体间的依赖对应关系,从而提高关联系统的相似性。仿真结果表明,该方法能有效减少连锁故障的发生,增强了系统的健壮性。由于众多关键基础设施(如电力网、通信网)都属于关联系统的范畴,所提算法具有广泛的应用价值。

【Abstract】 Describing a complex system as a network consisting of numerous interacting individuals is an important approach to understand of a complex system. Within a single decade, research on complex networks has been expanded to many areas including information, control, phyisics, biology, social science, and so on. As more and more complex systems have been modeled as networks, data mining in complex networks emerges as an important task. One of the very recently proposed subject with high novelty is the identification of similarity between complex network systems that consists of two parts, i.e., individual level and system level similarity. Identification of individual level similarity is the key problem of this research, whose main purpose is to reveal the existent yet unknown correspondences between nodes from different interacted networks. In fact, it can be represented as the node matching problem between complex networks. Identification of similarity between complex network systems has its practical and theoretical significance in broad areas, such as large-scale interacted system analysis and optimization, multi-source information integration, inter-network searching, homogeneous proteins revealing and machine translation, just to name a few. This thesis systematically investigates the identification of similarity between complex network systems. The research of the thesis is organized in the form of problem modeling, algorithm design, and practice. More carefully, the algorithm design part is splited according to two specifications, i.e., node correspondecy constraint (one-to-one, one-to-many) and number of networks matched (parewise, mutiple). The main contents of this thesis are summarized as follows:1. Two interacted network models are proposed, i.e., homogeneous-evolution model and coevolution model, both of which can be adjusted regarding to interacting strengthness and network structure, and the former can easily generate multiple interacted networks simutaniously. In order to maintain the connectivity of the generated networks, an efficient method has been introduced. Then, by expanding the definition of similarity between nodes of the same network, three kinds of similarities between nodes from different networks are proposed. Moreover, a measurement of differentiability has been introduced to evaluate the effectiveness of the proposed similarity functions. On one hand, this part of research provides simulation platforms and sheds light on algorithm design for the identification of similarity between complex networks. On the other hand, it is also meaningful for generalizing other data mining techniques that tackle only single network to multiple networks.2. A node matching algorithm based on similarity propagation is proposed, which can be used to solve one-to-one node matching problem between small-scale networks and requires no special constrain for distribution of the revealed matching node pairs. The propagation process makes the initial similarity information propagate along the network topology globally, and consequently, information provided by the revealed pairewise matched nodes can be fully exploited. As a result, a relatively acceptable matching precision can be met even there are very limited numbers of revealed pairewise matched nodes.3. A novel iterative node matching algorithm with approximately linear complexity is proposed to sovle the one-to-one node matching problem between large-scale networks in real world. At the same time, a degree based revealed pairwise matched nodes selection strategy has been introduced to improve the matching results in the early period of the iteration and therefore improve the overall matching precision of the iterative matching algorithm. It is revealed that, as long as the revealed pairewise matched nodes are relatively centralized, the algorithm performs well both in time and matching precision.4. One-to-many node matching problem between networks is formulated, and we propose two one-to-many node matching algorithms based on local mapping and ensembling, respectively. The former overcomes the short sightedness of the one-to-one iterative node matching, and the later bears a close resemblance to Bagging ensemble, which is relatively robust to noise and can be easily parallelized. A pair of real-world interacted networks has been extracted from a large scale Instant Messaging system. With this pair of networks, empirical study of the proposed iterative one-to-one and one-to-many node matching algorithms has been carried out, respectively. It is revealed that the proposed one-to-many algorithm can quickly narrow down the searching range.5. The node matching among multiple networks is formulated and we propose an algorithm based on cluster extraction to solve it. The proposed algorithm exploits information provided by all of the networks involved as well as the revealed pairewise matched nodes. Matching experiments carried on four kinds of networks with different structure demonstrates the effectiveness of the proposed algorithm. 6. Identification of similarity between complex network systems has been successfully applied to the analysis of cascading failure in interdependent systems. It is revealed that the similarity between interdependent systems is helpful to avoid the cascading failure. According to this discovery, a node matching based cascading failure protection method is proposed, which optimizes the correspondences between adjustable nodes while keeping the structure of the interdependent networks unchanged. It is revealed that the node matching based design can significantly improves the resistance to cascading failures in interdependent networks. Due to the reason that many real-world critical infrastructures are indeed interdependent, the proposed optimal design method based on node matching has its practical significance.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络