节点文献
基于连接度量的社区发现研究
Research on Community Detection Based on Link Measure
【作者】 孔兵;
【导师】 刘惟一;
【作者基本信息】 云南大学 , 通信与信息系统, 2012, 博士
【摘要】 近年来,在不同领域、广泛多样的系统中,图变成了一种极其有用的描述工具,生物,社会,技术和信息网络等许多领域的系统都能建模为图(网络)来研究。为了理解网络系统的特征,网络分析研究就变得非常重要。对网络的研究从最初的规则网络,到随机网络,到近几年的复杂网络,越来越多关于网络的研究工作持续开展。无标度网络的发现,使人类对于复杂网络系统有了全新的认识。在这样的网络中,顶点度的分布显示出很大的不均匀性,同时,表现出高水平的秩序和组织,特点是网络中顶点度的分布非常广泛,大量低度数的顶点围绕少量高度数的顶点共存,网络呈现出社区结构(也称为簇或模块)。社区发现在社会学、生物学和计算机科学等学科中都有重要应用,如提高网络服务质量、增强网络销售、发现有共同兴趣的科学家小组、揭示政府组织运行方式、发现动物的社会组织结构、聚类功能基因等等。近年,关于社区发现的很多新概念和新方法被提出来。在无权图研究方面:模块度的提出极大促进该方向的研究,基于模块度优化的社区发现方法成为了主流方法,但现有模块度定义在一些应用背景下存在不足,有的模块度优化方法也存在缺陷。在带权图研究方面:针对带权图的工作相对来说比较少,权值本身的意义没有获得重视。而且,很多社会网络是基于交互行为建立的,交互的数量对社区结构有重要的影响。鉴于上述问题,为扩展社区发现算法的适应范围、提高发现社区的准确性、合理发现带权图中的社区结构,本文关注连接的度量问题,在模块度定义、改进社区发现算法、带权图社区发现等方面开展了研究工作。本文的主要研究内容和创新之处总结如下:(1)提出了基于耦合度的模块度。针对现有模块度在社区规模差异大和多社区情况下的不足,根据社区内部顶点间的连接相对紧密,社区之间的连接相对稀疏的特点。定义了社区间的连接密度和社区内的连接密度两个数值,通过二者之比定义了社区间祸合度,社区间连接越稀疏,社区内部越紧密,则社区耦合度越小。基于社区耦合度的定义,提出一种新的基于社区耦合程度的模块度,用来度量整个网络中社区划分质量。通过实验证实,该模块度既适用于社区大小相似的情形,也适用于社区大小差异较大的情形,并且在多社区的情况下,有比较合理的度量效果。(2)提出了一个非递归分裂的模块度优化社区发现算法。因递归二分过程的影响,分裂式模块度优化算法在多社区情况下存在划分结果可能不理想的问题。本文依据类似于k-均值聚类的思想,提出了一种基于模块度优化的社区发现算法。该算法根据指定的社区数目对网络进行初始划分,通过迭代移动社区内的顶点,完成网络的自组织,使得算法在当前社区数目已定的条件下,搜索到具有最大模块度的网络划分。随后,增大社区数目,重复上述过程,直至模块度减小或社区数目达到用户指定的最大数目k。算法中,每次对社区数目为n的初始划分都是在原始网络上进行,有效避免了递归二分引起的问题。实验表明,社区发现的效果良好。(3)基于成员交互行为,提出交互度的概念,并应用于层次凝聚社区发现算法。在描述交互关系的社会网络中,直观地认识到,顶点之间交互的强弱主要是包括两个方面:一是交互的绝对数量,二是交互的对等性。据此,提出了顶点交互度的概念,分析了网络拓扑结构对交互度的影响,引入概率中的可靠性概念,把顶点之间的交互度推广到社区之间的交互度。随后,以交互度作为层次凝聚算法的相似度指标,提出了基于交互度的层次凝聚社区发现算法,该算法能够比较自然地模拟社区的形成。实验表明,在带权图中能较好的发现合理的社区结构。(4)开发了一个社区发现的可视化工具软件Snviewer,提出了分层力导引算法。为了直观、高效地观察图和社区发现结果,基于JAVA平台,利用OpenGL图形编程接口,以力导引方法为布点算法,开发了一个可以动态演示图结构和社区发现结果的可视化工具。为了更好地显示社区发现结果,提出了分层力导引算法。另外,在动态布局过程中提供了人机交互机制,可以直观地人工干预布点过程,使布点结果更符合人们的观察习惯。
【Abstract】 In the20th century, graphs have already become extremely useful as the representation of a wide variety of systems in different areas. Biological, social, technological, and information networks can be studied as graphs, and graph analysis has become crucial to understand the features of these systems. More and more study work is being developed further from regular networks to random networks to complex networks, among which, the discovery of scale-free networks gives a fresh view to understand the complex networks. The degree distribute of vertices unevenly in it, meanwhile, it shows a high level of order and organization. The degree distribution is broad, with a tail that often follows a power law:many vertices with low degree coexist with some vertices with large degree. The feature is called community structure.Community detection plays an important part in the field of sociology, biology and computer science, such as improving quality of service of the World Wide Web, enhancing network sale, finding a group of scientists’interest in common, revealing the secret of government organization running, discovering social structure in animal world, clustering features gene and so forth.In recent years, a large number of new concept and methods has been given to study community detection. In terms of unweighted graphs, modularity makes the study go further and community detecting algorithms based on modularity optimization has been a mainstream. However, the existing modularity definition is inadequate to all situations and there are still some defects about some algorithms based on modularity optimization. In the study of weighted graphs, there are a few relative studies and the weight’s significance be neglected. Lots of social networks are created on interaction activities. Meanwhile, the quantity of such activities is another influence to community structure. Based on these problems, this thesis concerns about the problem of link measure and gives a new research in aspects of modularity definition, improving algorithms based on modularity optimization, community detecting in weighed graphs so as to extend suitable range of the modularity, improve accuracy of community detection, and reveal reasonable community structure of weighted graphs.The main contributions and novelties of this thesis are summarized as follows:(1) Proposes a new modularity based on the concept of coupling coefficient. The existing modularity has shortages to compare the quality of the community structure of networks which are very different in size or have multiple communities. According to the characteristic of communities, which vertices are densely connected within the community and have much sparser connection between the communities, it defines values of the link density between communities and the link density within a community. By the ratio of two values, it defines coupling coefficient between communities. The lower the link density between communities is and the higher the link density within community is, the smaller coupling coefficient of community will be. A new modularity based on coupling coefficient is suggested to access the goodness of a network partition.. The experiment figure shows that the new modularity is suitable for both similar and very different communities in size, it also works in multiple community.(2) Propose a non-recursive and divisive algorithm of modularity optimization for community detecting. For the impact of recursive bisectioning, the divisive algorithm based on modularity optimization is inadequate for multiple communities. Based on k-means clustering idea, the writer suggests a modularity-based algorithm for community detecting. The main idea of our algorithm is setting a number as the number of communities in a network, dividing nodes according to their degree in the network to get initial communities, and optimizing communities by shifting nodes between communities so that modularity has the maximal increase; then changing the number of communities and repeat the process above until the modularity cannot be improved any more by the procedure or the number of communities is changed to k, which is the user-specified biggest number of community. For the initial partition to each number of community works on the original networks, our algorithm avoids the problem caused by recursive bisectioning. The experimental results show that our algorithm can correctly detect the communities.(3) Propose the concept of interaction degree on the interactive behavior between members of community, and applied to the hierarchical agglomerative algorithm for community detecting. In the social networks representing interactive relationships, it is intuitively realized that the measure of interactive relationships should consider two aspects, one is absolute quantity of interactive activities, and the other is reciprocity of interaction. Accordingly, we propose the concept of interaction degree between vertices and analyze the impact of topology structure of networks on the interaction degree. By introducing the reliability in the probability theory, we extend the interaction degree from between vertices to between communities. Then the interaction degree being as similarity measures, the writer proposes the hierarchical agglomerative algorithm for community detecting. The algorithm can naturally simulate the formation of community. Experiments show that it can detect the reasonable community structure in weighted graphs.(4) Develop a software tool for visualization of community detection, and propose a layered force-directed algorithm. Basing on JAVA, OpenGL graph programming interface, and force-directed method, we develop a visualization tool for showing dynamically graph structure and community detection consequence so as to make the graph be intuitive. In order to better display the results of community detecting, we propose a layered force-directed algorithm. In addition, it offers human-computer interaction mechanism in dynamic layout process, which makes artificial intervention possible and satisfies operators’observation convention as well.
【Key words】 Link; Community detection; Modularity; Interaction degree; Visualizaiton;