节点文献

基于目标函数优化的复杂网络社区结构发现

Community Structures Detection in Complex Networks via Objective Function Optimization

【作者】 刘旭

【导师】 易东云; 郑辉;

【作者基本信息】 国防科学技术大学 , 系统分析与集成, 2012, 博士

【摘要】 随着信息技术和人们生产和生活实践的发展,现代社会已经进入了大数据时代。大数据时代所面临的主要挑战是,自然、社会和信息等领域的实际系统规模日益庞大、关系日益复杂,是典型的复杂系统。随着数据生成和数据采集技术的发展,人们对复杂系统的研究逐渐从理论探讨过渡到数据处理方面。从一般系统论的角度来看,复杂网络是复杂系统的简化表示,即将复杂系统表示成其组成元素以及组成元素间的相互作用的集合。通过这种有效的抽象,使得我们可以从数学的角度来研究复杂系统的性质。复杂网络在微观尺度具有局部积聚、马太效应等性质;在宏观尺度具有小世界、无标度等性质。除了上述两种极端情况,复杂网络还具有明显的层次结构,存在着介于宏观和微观之间的介观特性,包括结构特性和动力学特性。社区结构就是复杂网络在介观尺度的关键结构特性之一。通常用模块度来度量社区内部连接的密集程度,它是社区发现的一个重要的目标函数。本文研究复杂网络社区结构发现问题,围绕基于目标函数优化的复杂网络社区结构发现方法,开展了如下四个方面的工作。第一,提出了基于最小夹角的社区结构优化算法。模块度矩阵的谱分解的一个主要不足在于模块度矩阵没有非负性,针对这一问题从模块度的概率解释出发对模块度矩阵的对角线进行了修正使其成为随机变量的样本方差,由此得到的正定矩阵,改进了社区发现问题的向量划分描述。进一步,从向量划分的角度解释了有限分辨率现象的根源,设计了以最小化向量夹角的贪婪算法,该方法比直接优化模块度的方法有更高的异质社区分辨能力。第二,改进了基于局部拓扑相似性的社区结构发现方法。针对现有相似性度量存在的主要问题即对直接连接的低估倾向,提出了基于星形邻域和α-邻域的相似性度量修正了上述缺陷。在此基础上设计了相应的层次聚类算法,改善了社区发现结果。算法采用堆结构存储相似性矩阵,提高了算法速度。第三,提出了有限随机的模块度优化算法。直接优化模块度的贪婪算法容易陷入局部极值,同时模块度存在大量与极大值非常接近的次优解。本文提出的随机搜索算法能够跳出局部极值,多次的执行结果能够探索到模块度的平台区域,体现网络社区结构的多样性。第四,考虑了基于完全子图分析的社区结构发现方法。以完全子图作为基本结构单元,通过建立节点-派系二部图和构造派系图的方式进行无重叠的社区结构发现。通过建立网络的极大完全子图紧覆盖所构成的派系图,以重叠程度评估派系相似性,利用加权模块度来切割派系图,进而形成网络的重叠社区结构。

【Abstract】 Withthedevelopmentofinformationtechnologyandtheproducingandeverydaylifepractices, modern society has entered a typical big-data era. The major challenge in thebig-data time is that we are facing increasingly big and complex systems in nature,societyandtechnologyfields. Withrapiddevelopingofdatageneratingandgatheringtechnology,the research in complex systems gradually swift from theoretical aspects to data drivingapproaches.Inthegeneralsystemtheory,complexnetworkisasimplifiedmodelforcomplexsys-tem which modeling the system as a set of its components and relations betweens them.This effective abstraction enables us to to study the complex systems by mathematics.The microscopic scale properties of complex networks are highly local clustering coeffi-cients, rich club phenomenon. While in the macroscopic scale complex network possesssmall-world and scale-free properties. Between the above two extreme scales there isa mesoscopic scale that complex networks show some significant properties, includingstructure and dynamic features. Community structure is one of the key structural proper-ties of complex networks at the mesoscopic scale.Modularity is one typical objective function for community detection which mea-sures the link density within vertexes groups. In this the thesis we study communitydetection problem, with focus on the community detection framework based on objectivefunction optimizing in the following four different ways:First,we propose a community optimizing algorithm based on least-angle merging.By treating the network as directed graph, we reformulate the modularity matrix as cross-covariance of vertex vectors taking directed edges as their basis. One main issue of themodularity matrix is that it has eigenvalues. The existing method treated community de-tection as a vector partition problem by spectral decomposition of modularity matrix, us-ing only a few leading positive eigenvalues and the corresponding vectors. The diagonalof the cross-covariance matrix is modified to make it positive defined and then factoredinto inner product of vertex vectors. Thus the above modification improves the trans-formation of the community detection problem as a vector partition problem. From thevector partition point of view, the resolution limit of modularity is reinterpreted.A greedyalgorithm based on merging vectors with least angle is designed.The proposed method is less resolution limited than modularity optimizing methods.Second,we improve the local similarity based community detection method. Sincethe existing similarities tend to underestimate the existing links, we propose new simi-larities for community detecting based on star-neighborhood and α-neighborhood to overthesedefects. Wethendesigncorrespondinghierarchicalclusteringalgorithmscoordinatewith these new metrics, greatly promoting community detection results. The similaritymatrices are stored by max-heap data structure and the algorithms run quite fast.Third,weproposerestrictedrandomnesssearchingalgorithmformodularityoptimiz-ing. It is known that pure greedy modularity optimizing tend to get trapped in one singlelocal maximal and there are many of them due to the fact that modularity is extremely de-generate. The random search algorithm based on the head-heap data structure proposedhere is designed to jump out of the local maximal and efficiently extract diverse commu-nity structures in multiple runs.Last but not least, we consider community detection methods based on clique analy-siswhichtakingcliquesasbasicbuildingblocks. ofbothdisjointandoverlappingcommu-nities. By building vertex-clique bipartite graph and evaluating weights between vertexesand cliques,we extract feature vectors for vertexes from the adjacency matrix. Spectralclustering method is carried out on the matrix of correlation coefficients of the featurevectors for disjoint community structure detection. Clique graph is built from the max-imal cliques cover of the network with edge weighted by the degree of clique overlap.This graph is further splitted by weighted modularity, forming overlapping communitystructure of the original network.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络