节点文献

复杂网络社团结构分析方法研究

Studies on Methods for Analyzing Community Structure in Complex Networks

【作者】 赖大荣

【导师】 卢宏涛;

【作者基本信息】 上海交通大学 , 计算机软件与理论, 2011, 博士

【摘要】 复杂网络研究作为一个新兴的学科方向,极大地吸引了来自不同学科研究人员的广泛关注。对复杂网络的定性和定量特征的研究,有助于揭示复杂网络表示下的不同复杂系统中普遍存在的一般规律,在生物科学、社会科学等诸多学科中具有重要意义。社团结构是复杂网络的一个关键结构规律,因此准确分析复杂网络的社团结构是复杂网络研究中的一个非常重要的课题。在复杂网络社团结构的分析研究中,模块度度量(及其变种)起了很重要的作用,催生了一大类重要的社团结构分析方法。但是这类通过对模块度度量(及其变种)优化来获得网络社团划分的方法存在分辨率问题,从而极大地限制了基于模块度方法的有效性和应用广度。另一方面,目前社团结构分析研究主要集中在无向网络上,系统地直接分析有向网络社团结构的研究还远远不及对无向网络社团结构的分析研究。实际上,当前有向网络社团结构划分的普遍做法是通过简单地忽略有向网络中边的方向,将有向网络当成无向网络分析。这种方法在大多数情况下由于忽略了有向网络中边的方向包含的重要拓扑结构信息,从而无法正确有效地划分网络的社团结构。另外,在有向网络上扩展多数无向网络社团结构的分析方法往往不是简单的工作。本论文从模块度度量的相关问题出发,将无向二元网络上模块度优化时的分辨率问题的分析扩展到无向加权网络中,并在此基础上提出了一种结合网络预处理的增强型模块度优化社团结构分析方法。通过分析无向网络上的模块度优化产生分辨率问题的条件,提出通过有效地获得关于社团之间边的信息来处理模块度优化时可能产生的分辨率问题。基于网络上的动力学过程反映网络的潜在结构这一事实,利用无向网络上的随机游走过程获取关于社团内部边和社团外部边的信息,并通过边的迭代重加权操作使得网络边的权值不断向社团内部集中,而社团之间的边权值越来越小,从而有效地克服模块度优化时的分辨率问题。实验分析表明,这种结合了网络预处理的无向网络社团分析方法极大地增强了基于模块度优化的无向网络社团结构分析方法的有效性和应用广度。针对传统的有向网络社团结构分析方法简单忽略边的方向将使得社团划分不准确的问题,提出了一种基于有向网络嵌入的有向网络社团结构分析方法。该方法通过在有向网络上扩展无向网络的网络嵌入分析,从有向网络嵌入目标函数关联的拉普拉斯矩阵中分离出包含了边的方向信息的对称矩阵作为有向网络对等的无向加权网络的邻接矩阵,并导出一种关于有向网络社团结构划分的新模块度定义。一方面,新模块度度量进一步解释了有向网络社团的另一种物理内涵:同一社团内的结点是最大程度保持了有向网络局部结构的相邻结点集合;另一方面,它建立了有向网络与某种无向网络的联系,这种联系表明有向网络的社团结构可以通过应用已有的无向网络社团结构分析方法可靠并且有效地在所导出的无向网络上分析得到。进一步扩展了基于有向网络嵌入的有向网络社团结构分析方法的分析思路,提出了一种基于边的方向信息抽取的有向网络社团结构分析方法。该方法基于具有社团结构的有向网络上的流运动将更加容易地在社团内部结点之间进行的事实,通过有向网络上的PageRank型随机游走过程有效地抽取有向网络边的方向包含的信息,并采用适当的相似性函数将其转化为边的权值与有向网络中边的互联性集成共同定义有向网络的社团结构。在抽取足够多的关于边的方向信息后,通过忽略边的方向将有向网络的社团划分问题等价为所导出的无向加权网络的社团划分问题,在保持社团结构划分准确性的同时有效地简化了有向网络社团结构划分问题的分析,避免了无向网络分析方法常常难以直接扩展到有向网络社团结构分析中的问题。在分析前述关于无向网络和有向网络社团结构分析方法的可统一性基础上,提出了一种基于网络结点间消息传递的社团结构分析的一致框架。通过随机游走将网络结点表示成向量空间中的点,并以此有效地定义表达结点属于同一社团可能程度的相似性。利用affinity propagation算法的消息传递机制将网络的社团结构划分问题转换成社团领导者的推选过程。在实际网络和标准测试网络上的分析表明,这种消息传递的网络社团结构分析框架能够:(i)有效地处理分辨率问题,即有效地分析那些通过模块度优化会出现分辨率问题的网络的社团结构;(ii)在一个一致的框架中分析有向网络和无向网络的社团结构;(iii)揭示网络中可能存在的多级社团结构。

【Abstract】 As a new emerging discipline, research on complex networks attracts scientists from a varietyof different fields. In fact, studies that can qualitatively and quantitatively characterize complexnetworks will help to unveil the general laws regulating different real systems modeled by complexnetworks, and therefore will be relevant in a number of disciplines (biology, social sciences, etc).Community structure is one of the crucial structural characteristics of complex networks; therefore,accurately analyzing their community structure represents a very relevant topic.In the research related to the analysis of community structure in complex networks, a measurenamed modularity (and variants) plays a crucial role, and from it a large amount of importantapproaches for analyzing community structure have been derived. However, it has been foundthat the optimization of modularity (and of its variants) suffers from resolution limit. This greatlyreduces the effectiveness and the range of applicability of those modularity based methods.Additionally, current studies on community structure analysis are mainly focusing on undi-rected networks, thus analyzing community structure in directed networks represents a yet poorlyunexplored direction in this research area. Indeed, so far, the common approach to communitystructure analysis in directed networks has simply proceeded by discarding the direction of edgesand treating directed networks as undirected ones. Such type of approach frequently fails sincesimply discarding edges’direction removes valuable information and thus results in inaccuratecommunity partitions. Furthermore, very often, the extension of methods for community detectionin undirected networks to those for directed networks is far from trivial.Starting from the issues related to modularity, this thesis extends the analysis of the resolutionlimit from the case of binary undirected networks to that of weighted ones, and thus proposes an en-hanced modularity-based method for community structure analysis via network preprocessing. Byanalyzing the conditions relative to the resolution limit in undirected networks, it can be observedthat the problem can be effectively addressed by acquiring information on inter-community edges.Based on the fact that dynamics events occurring on a network will re?ect its underlying structure,random walks on undirected networks can be used as a surrogate to obtain effective informationon intra-community and inter-community edges. After such information is obtained, iteratively reweighting intra-community and inter-community edges will cause weights to concentrate moreand more into communities, while becoming very small or even vanishing among communities.As a result, the resolution limit problem can be effectively addressed when optimizing modularity.The applications of such a method on real and benchmark networks demonstrate that the effective-ness and the range of applications of using modularity based methods for detecting communitiesin undirected networks is greatly enhanced via network preprocessing.With respect to the problem of inaccurate communities partitioning by discarding the infor-mation contained in edges’direction, a method for analyzing community structure by directednetwork embedding is then proposed. By extending network embedding for undirected networksto directed ones, this method extracts a symmetric matrix from the Laplacian matrix associatedwith the target function for directed network embedding. Such a symmetric matrix can be re-garded as an adjacency matrix for a weighted undirected network that contains the informationon edge direction in the original directed network. Consequently, a new modularity definitionfor directed networks is derived by reformulating the modularity for undirected networks on theinduced adjacency matrix. This new modularity measure provides on one hand an additional phys-ical meaning of communities in directed networks: nodes in the same community are neighboringnodes that preserve as much as possible the local structure of a network. On the other hand, itdiscloses the relationship between the directed network under study and its induced undirectedone. Therefore, methods designed to partition undirected networks, can successfully, safely andefficiently be applied to directed ones, via the definition of the induced undirected network.By further extending this idea, another community analysis method for directed networks isproposed. This method is based on the fact that the ?ow on a directed network with a communitystructure will move more easily among nodes within the same community. It thus employs PageR-ank random walks on directed networks as a proxy of ?ow to effectively extract the informationcontained in edges’direction. By choosing a suitable similarity function, the obtained informationis transformed into edges’weight that will be integrated with edges’connectedness to define com-munity structure in directed networks. After the information contained in the edge directions hasbeen extracted iteratively following this procedure, the edge directions can be safely discarded, andthe community partitioning of a directed network can be obtained via partitioning the undirectedweighted network. This strategy simplifies the problem while retaining the accuracy in analyzingcommunity structure in directed networks.Based on the possible uniformity of the above methods for analyzing community structurein undirected and directed networks, a message passing based unified framework for communitydetection is proposed. In this frame, nodes in a network are expressed in terms of points in avectorial space via random walks on networks. Similarity between nodes can thus be effectivelydefined, and this similarity represents the likelihood of a pair of nodes to be in the same commu- nity. By using message passing mechanism from the affinity propagation method, the communitypartitioning can be regarded as the process of electing community leaders. The applications ofthis message passing based community detection method on real and benchmark networks demon-strate that this framework can effectively: (i) address the problem of the resolution limit, i.e., itcan effectively and correctly partition into communities those networks on which resolution limitwill occur by optimizing modularity; (ii) analyze community structure of undirected networks anddirected networks in a uniform way; (iii) uncover the possible hierarchical community structure innetworks.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络