节点文献

复杂网络中的社团检测问题研究

Research on Community Detection in Complex Networks

【作者】 杨树忠

【导师】 罗四维;

【作者基本信息】 北京交通大学 , 计算机应用技术, 2009, 博士

【摘要】 近年来,由于信息技术的飞速发展,科学家和学者们能够越来越容易地在现实世界中收集到高通量的网络数据,这使得复杂网络的研究变得炙手可热起来。除了Watts和Strogatz在1998年发现的小世界特性以及Barabasi和Albert在1999年发现的无标度(尺度)特性外,网络的社团(社区、模块)结构特性被认为是复杂网络中最重要的统计特性之一。目前,关于网络中的社团仍然不存在统一的定义,通常指的是满足下面条件的节点子集:子集内部节点之间具有稠密的连接,而与子集外部的节点具有稀疏的连接。研究表明,网络的这种社团结构与某些功能属性有着紧密的联系,比如网络的鲁棒性和信息快速传递特性等。因此在网络中描述和检测这些社团结构具有重要的实际意义并已成为近几年来的研究热点。本论文中,我们主要关注复杂网络中社团结构检测相关问题的研究,取得的主要研究成果如下:1.提出了一种新的社团检测方法-JEOMD。该方法以加入惩罚项的模块密度函数D作为指标函数,并使用跳跃极值最优化方法来优化该指标函数,能够得到网络的层次分割结果。在一组真实网络上的实验表明,与基于模块化函数Q的优化方法相比,JEOMD方法能够更有效地检测出网络中存在的不同层次和不同规模的社团结构。在该组真实网络及其对应的随机网络上的对比实验表明,加入惩罚项的模块密度函数D作为社团检测的指标函数在一定程度上比模块化函数Q更有效。2.提出了一个新的衡量社团划分质量的指标函数,称之为标准化模块密度(normalized modularity density-NMD)。在示例网络中证明了NMD能够改善模块化函数Q中存在的分辨率极限问题;证明了NMD与图分割中广泛应用的normalized-cut目标函数存在着紧密的联系,即当社团数目m已知时,最大化标准化模块密度相当于最小化目标函数normalized-cut。使用改进的模拟退火算法优化指标NMD。在一组模拟网络上的实验表明,与基于模块化函数Q的优化方法相比,基于NMD的方法能够取得更高的检测正确率;在一组真实网络上的实验表明,基于NMD的方法能够检测出网络中更精细的模块结构,从而为NMD能够改善Q的分辨率问题提供了更进一步的证据;把指标函数NMD和D推广到其加权形式NMDω和Dw,并用于在加权网络中进行社团检测。分析了加权模块化函数Qw同样存在分辨率极限问题;指出了加权模块密度函数Dw存在的负社团问题;在两个实例网络上证明了NMDω不仅能够克服Qw中存在的分辨率极限问题,而且能够避免Dw中出现的负社团问题。为了比较指标NMDω与Qw和Dw在加权网络社团检测中的性能,使用模拟退火算法实现三个指标的优化。在一组模拟网络上的实验表明,优化NMDω指标能够取得比优化Qw和Dw更高的检测正确率;在一组真实网络上的实验表明,优化NMDω指标,不仅能够从加权网络中检测出精细尺度下的社团,尤其是优化Qw所不能检测出来的小而稠密的社团,而且能够避免优化Dw时出现的负社团问题。3.提出了一种基于自适应核仿射传播的网络社团检测方法-AKAP。在该方法中,标准化的马尔可夫扩散核(Markov diffusion kernel)被用来隐式地衡量节点之间的非相似度,引入自适应仿射传播方法优化得到的非相似度矩阵。在模拟网络上的实验表明,与Newman快速算法相比,AKAP方法能够取得更高的检测正确率;在一组真实网络上的实验表明,AKAP方法能够有效地检测出网络中存在的有意义的社团结构和包含的社团数目。4.提出了一种有效的网络节点的可视化方法。在该方法中,定义了一种新的节点间距离,将得到的距离矩阵作为全局保形映射Isomap的测地线距离,应用Isomap方法得到网络的二维可视化结果。在模拟网络上的实验表明,与基于能量模型的可视化方法相比,我们的方法能够更好地保持原始网络中节点的局部和全局相似性,且当网络具有清晰的社团结构时,得到的二维可视化点集也具有紧凑的聚类特性,因此能够从可视化结果中判断原始网络是否具有社团结构特性,也可以对可视化的结果继续采用传统的聚类方法进行聚类以得到网络每个社团的具体成员。

【Abstract】 In recent years, along with the high-speed development of information technique, complex network has become a growing research field partly as a result of the increas-ing availability of a huge number of networks in the real world. Besides small world effect found by Watts and Strogatz in 1998 and scale-free effect found by Barabasi and Albert in 1999, community(module) structure has been considered to be one of the most important characteristics in complex networks. Until now, there isn’t a pre-cise definition of community. Generally speaking, community refers to such subgraph within which vertex-vertex connections are dense, but outside which connections are relatively sparse. The investigations show that this kind of community structure has close relationship with some functionality such as robustness and fast diffusion, etc. So characterizing and detecting such community structure has very important practical significance and has become a research hot spot.The related problems on community detection in complex networks are studied in this paper, and the main results are as follows:1. A novel community detection algorithm is presented which is called JEOMD. It uses improved modularity density D with penalty term as the objective function and optimizes this objective through jumping extremal optimization technique. As a result, hierarchical partition of the given network can be obtained. Exper-imental results on several real-world networks show that, compared with those methods based on optimizing modularity Q, JEOMD can detect out hidden com-munity structure with different hierarchies and different scales more efficiently; comparative experiments on those real-world networks and their random counter-parts show that D with penalty term is more efficient than modularity function Q in some aspects.2. A novel local quantitative measure called normalized modularity density N M D is proposed. The advantage that NMD can improve the resolution limit of Q is proved. The close connection between NMD and normalized-cut is derived; that is to say if the number of communities m is predetermined, maximizing the measure NMD equals to minimizing normalized-cut. Simulated annealing algorithm is designed to optimize the indexes NMD. Experiments on a suit of computer-generated networks show that, compared with Q-based optimization methods, optimizing NMD can obtained the higher accuracy; comparative ex-periment on several real-world networks show that optimizing NMD can detect out finer community structure, which provides further evidence that NMD can improve the resolution limit of Q;NMD and D are generalized to their weighted versions. The similar resolu-tion limit problem of Qw and the new emerging negative community problem of Dw is analyzed, and then the related certifications on the advantages of N M Dw to Qw and Dw in these two aspects are derived. In order to compare the perfor-mances of the three indexes, simulated annealing algorithm is used to optimize them. Experiments on a suit of weighted computer-generated networks show that, compared with Qw-based methods, optimizing the index NMDw can ob-tained the higher accuracy; experiments on several real-world networks show that optimizing the index NMDw not only can detect out the community structure with different scales, especially some small and dense communities that optimiz-ing Qw can not detect, but also can avoid the negative community problem in Dw.3. A method called adaptive kernel affinity propagation (AKAP) is proposed to de-tect communities in networks, in which the normalized Markov diffusion kernel is used to implicitly measure the dissimilarities between different nodes and then adaptive affinity propagation is applied to optimize the obtained dissimilarity matrix. Comparative experiments on computer-generated networks show that AKAP can obtained the higher accuracy than Newman fast algorithm; experi- ments on several real-world networks demonstrate that adaptive kernel affinity propagation can detect the optimal number of communities and the meaningful communities.4. An efficient visualization method to nodes in complex networks is proposed. A new distance between nodes is defined. The obtained distance matrix is used as the geodesic distance of Isomap and as a result all nodes are projected into a two dimensional plane. The experiments on computer-generated networks show that, compared with energy-based models, the proposed method can keep local and global similarity better in original networks. When the given networks have clear community structure, the projected nodes also have compact clustering property. So whether the original network has community structure or not can be judged by the visualization results, and the detailed community memberships can be obtained by clustering the projected coordinates.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络