节点文献

Web社区发现算法的研究

A Syudy of Web Community Detection Algorithm

【作者】 黄伟平

【导师】 刘瑞芳;

【作者基本信息】 北京邮电大学 , 电子与通信工程(专业学位), 2013, 硕士

【摘要】 社区发现技术是网络研究中一项非常重要的技术,因为社区结构在定程度上反映了真实系统的拓扑关系,所以,识别出网络图中的潜在社区具有非常重要的研究意义。特别需要指出的,随着互联网的飞速发展,其已经成为了人类社会生活中不可缺少的一部分,发现并分析存在于互连网中的社区则具有史深刻的意义。我们知道,现实生活中的各种社交圈子可以在某些层面上,反映出人与人之间的关系。而同一社交圈子的人通常在网络上也会存在一些联系,因此对诸如FaceBook、 Twitter、新浪微博、天涯、猫扑、豆瓣网、人人网等在线社交网络进行社区结构分析,进而发现人们之问存在的各种潜在关系,这样不仅可以快速的了解和预测人们的活动,而且还可以更加有效的监测社会舆论的发展情况,同时也可以在对比现实社区和虚拟网络社区中人们活动的异同时给我们提供依据。另外,现如今基本每一个在线商城都有商品推推荐功能,而这个功能的实现其实也是基于社区发现技术的。商品推荐系统的关键其实就是要把那些具有相似购买兴趣的顾客从庞大的顾客和商品购买关系网络中识别出来,这个过程其实就是一个社区发现的过程。再者,互联网中充斥着各种各样的信息,想要快速的从这些海量数据中,提取出用户需要的信息是很不容易的。如果我们对互联网中存在的这些信息进行社区发现的话,不仅可以把这个问题解决,还可以实现针对个人的信息推荐,以及网络中的智能搜索功能,从而带领用户,让他们更加准确和快速的找到自己感兴趣的信息。本文首先对一些和社区发现技术有关的理论知识进行了介绍,之后又对一些早期的比较经典的社区发现算法,如传统图分割算法Kernighan-Lin算法、层次聚类方法中的GN算法、基于模块度优化的Newman快速算法(其也属于凝聚算法)、谱分析算法普分法、以及Palla等人提出的用于重叠社区发现的CPM算法等进行了介绍,同时还对它们的优势及不足,算法的复杂度度及适用范围进行了分析。此外,还对社区质量客观评价标准进行了介绍。本文在深入分析并理解现有社区发现算法的基础上,结合微博这种双向性社交网络的特性,提出了两个针对微博的社区发现新算法,分别是:1)基于社区吸引力的社区发现算法和2)基于社区吸引力的重叠社区发现新算法。其中,算法1)的提出,是为了解决现有社区发现算法在面对微博这种大规模网络时复杂度过高而难以应用的问题。算法1)不仅在时间复杂度方面优于现有的社区发现算法,同时在社区发现精准度方面也有不俗的表现。而算法2)的提出,是因为我们考虑到在现实世界中一个用户可能会同时属于多个社区,而现有的大多数算法以及我们所提出来的算法1)都是简单把每个用户划分到一个单独的社区中,而这与事实有点不太相符;同时,现有的一些重叠社区发现算法在面对大规模网络时性能不佳,算法2)解决这些问题。最后,我们还给出了这两个算法的实验结果,这些实验结果对我们所提出的两个算法在社区发现的有效性和高效性上均给出了有力的证明。

【Abstract】 Community Detection is a very important technology in network research. Because, in some extent, Community structure could reflect the topology relationship of real system, and there is great significance to find the potential communities in network. Particularly, with the rapid development of the Internet, it has become an indispensable part of human’s social life, so discovery and analyze the potential communities in the Internet has a further significance. We know that, in reality, the variety of social circles, in some levels, could reflect the relationship among people. The peoples in the same social circle usually also have some contacts in the Internet, Doing community structure analysis of the online social networks such as FaceBook, Twitter, Sina, Tianya, Mop could find the potential relationship exists among people, then we not only can quickly understand and predict people’s activities, but also can carry out a more effective monitoring of the development of public opinion. At the same time, it can provide some basis when we compare the activities of people between reality communities and virtual network community. On the other hand, nowadays, commodity recommended function is a function which most online malls provided, the realization of this function is actually based on community detection technology. The key of commodity recommended system is identifying the customers with similar buying interest from the relationship network of customers and commodities, this process is actually the process to do community detection among customers. Moreover, there are variety of information filled with the Internet, so getting the information we wanted is not an easy thing. If we do community detection among these information of the Internet, then, not only can this problem be solved and also can achieve the personal information recommended function, as well as the intelligent search function, which lead the user to find the information they interested more accurately and quickly.In this paper, we first introduced the theoretical knowledge of community detection, and then give a brief review of some traditional algorithms, such as Kernighan-Lin algorithm(a segmentation algorithm), GN algorithm(a hierarchical clustering algorithm), Newman-fast algorithm(a algorithm based on module optimization), CPM algorithm(a algorithm to find overlapping communities). To each algorithm, we also analyzed its advantages and disadvantages, as well as its complexity. In addition, we also introduced the objective quality evaluation criteria of community.After making deep analysis and understanding of the existing community detection algorithms, we proposed two new community detection algorithms. The two new algorithms are both designed for micro-blog, which is a bidirectional social network. The first algorithm is community detection algorithm based on the community attractive; and the second one is overlapping community detection algorithm based on community attractive. The reason why algorithm one is proposed, is to solve the problem that the existing community detection algorithms are of high complexity when used to analyze the large-scale network like micro-blog. Algorithm one not only performs better than the existing community detection algorithms in time complexity, also performs well in accuracy. In real world, a people may belong to multiple communities at the same time, but most existing algorithms and algorithm one just simply divide each individual into a single community, so the results of these algorithms are not quite consistent with the fact, that is why algorithm two is proposed; the existing overlapping community detection algorithms are of poor performance for large-scale network is also the reason. Finally, the simulation results give a strong proof to the effectiveness and efficiency of the two algorithms.

  • 【分类号】TP301.6
  • 【下载频次】334
节点文献中: 

本文链接的文献网络图示:

本文的引文网络