节点文献

微博用户行为分析和网络结构演化的研究

Research on User Behavior Analysis and Network Evolution in Microblogging Networks

【作者】 苑卫国

【导师】 刘云;

【作者基本信息】 北京交通大学 , 信息网络与安全, 2014, 博士

【摘要】 随着互联网络、尤其移动互联网络的飞速发展,微博已经成为一种非常重要的在线社会网络形式。在微博网络中,用户接入方式更加方便多样,交互方式更加灵活快捷,信息传播更加迅速广泛,其中用户行为和网络结构是影响信息传播过程的两个关键因素。鉴于此,本文采用交叉学科的思想和方法,针对微博中用户行为特征和模型、用户特征量分布形成机制和增长规律、网络中心性和信息传播度量、网络拓扑结构特征和演化模型等问题进行了研究,尝试发现微博用户行为模式和网络结构演化规律,建立能够刻画这些规律的数学模型,并寻找可以预测用户行为的相关策略。论文的工作有助于认识微博用户行为特征,加深对微博网络结构和信息传播关系的认识,也为复杂网络和社会网络的理论研究提供一些探索性的结果。论文的研究工作得到了国家自然科学基金项目(No.61172072、61271308)、北京市自然科学基金项目(No.4112045)和中央高校基本科研业务费专项资金研究生创新项目(No.2011YJS215)的支持,主要工作和创新点包括以下几个方面:1.研究微博用户特征量的分布和用户发布行为规律,建立用户发布微博的行为模型。实证分析发现新浪微博用户特征量具有不同幂律分布特征,且互相之间存在不同的相关性。发现用户个体和群体发布微博的时间间隔均呈现幂律分布,幂律指数与用户活跃程度成正比;用户发布兴趣受到其他用户交互行为的影响,并有明显的周期性;用户发布行为具有自相似特征。本文分析了基于社交驱动和兴趣驱动共同影响的微博用户发布模型,提出了一种基于用户兴趣衰减服从Logistic函数的用户发布模型,并使用该模型仿真验证了用户发布微博的时间间隔分布特征。此研究有助于更深入地理解微博用户的行为特征,为进一步研究微博网络结构和信息传播模式提供理论依据和形式参考。2.研究微博用户特征量分布的形成机制和增长规律。使用双帕累托对数正态(DPLN)分布对用户特征量分布进行拟合,相比对数正态分布和幂律分布,可以得到更优的效果,同时用户活跃时间服从指数分布,不同活跃时间的用户特征量都近似服从对数正态分布,用户特征量的增长率服从对数正态分布,且与特征量自身的规模无关,因此使用双帕累托对数正态分布模型解释了用户特征量的双段幂律形成机制。基于向量余弦距离相似性的K-means聚类算法,提出一种分析微博用户特征量增长模式的计算方法,并对不同排序和初始规模实际用户特征量的时间序列数据进行聚类分析;分析导致用户粉丝数爆发式增长的原因,并发现微博用户特征量和用户数增长之间存在异速增长现象。3.分析微博网络节点中心性特征并提出用户影响力度量方法。根据新浪微博实际用户数据,构造了两个基于双向“关注”的用户关系网络;通过分析网络拓扑统计特征,发现上述两个网络都具有小世界和无标度的特征;然后分别对两个网络的四种中心性指标(节点度、紧密度、介数和k-Core)及其相关性进行分析;在此基础上,借助基于传染病动力学的SIR信息传播模型,分别分析两个网络中具有不同中心性指标的初始传播节点对信息传播速度和范围的影响。结果表明,紧密度和k-Core较其他指标可以更加准确的描述节点在信息传播中所处的网络核心位置。进一步的分析可知上述两个指标有助于识别信息传播拓扑网络中的关键节点。该方法可为微博营销、用户推荐、网络舆情分析等领域的应用提供理论支撑。4.提出一种基于社团和混合连接特征的网络演化模型。通过对两个微博用户双向关注网络拓扑特征的进一步分析,发现二者均为异配网络,具有分层性质和社团结构,其社团规模呈指数分布。然后,根据微博用户双向关注数近似符合对数正态分布,以及真实微博双向关注网络的结构特点及其生成机制,提出了一种基于社团结构和混合连接特征的网络生成模型,该模型的混合连接机制包括:新增节点在社团内部分别采用服从对数正态分布适应度的择优连接和随机连接机制;已有节点在社团内择优选择后分别采用近邻互联和全局互联机制。仿真结果表明,该模型生成网络的度分布、聚类系数、度相关性、最短路径长度和社团结构等网络性质和特征参数能较好的符合实际网络,通过调节参数可以生成不同度分布和聚类系数的网络。

【Abstract】 :Driven by the fast development of the Web2.0and mobile network technology, Microblogs have been the most popular form of online social networking. As a self-media, users in Microblogs networks can participate in the interactions with other individuals anytime, anywhere and by utilizing a variety of access methods. Anew kind of complex network constitutes by user interaction become more flexible and quick. At the same time, user behavior and network structure has a direct influence on the process of information spreading. In view of this, we use the interdisciplinary ideas and methods to study user behavior and network evolution in Microblogs networks, trying to find out their statistics features, to reveal the underlying mechanism dominating their evolution, to establish mathematical models which can characterize these laws, and to put forword relevant strategies which can predict user behavior. Our work may help to understand the user behavior characteristics and the evolution process of network structure in Microblogs networks, to also provide some of exploratory theoretical results for the study of complex systems. The work of the dissertation is supported by the National Natural Science Foundation of China (No.61172072,61271308), Beijing Natural Science Foundation (No.4102047,4112045), and the Fundamental Research Funds for the Central Universities (No.2011YJS215). Main contributions of the dissertation are as follows:1. We study the user characteristics and posting behavior, and present a model of user posting behavior in microblog. Firstly, empirical analysis reveals statistical features and the relations of user characteristics. Secondly, it is found that the interval distribution of user’s posting behavior follows power-law at both individual and group level, and there is positive correlation between users’ active and the power-law exponent. The user’s posting time series has self-similarity characteristics and there is also periodicity on user’s posting behavior. Further study show that interval time distribution exponent is positively correlation with user interaction, and user’s interest is also influenced by retweet and comment behaviors. Considering these effects, we dicussed an improved model based on social-driven, interest-driven effect and the analysis results. We also proposed anther model where user interest changes with Logistic function. These models can restore the basic characteristics of the interval between statuses releases in the Microblogs networks. 2. We study the user characteristics and growth rates distribution. Based on the actual data from Sina Weibo, we studied the distribution of three users’characteristics, such as the number of followers, friends and statues, which are subject to the double power-law distribution and different types of users with various features. It is found find that the double Pareto lognormal (DPLN) distribution can better fit the overall distribution of user three characteristics than the lognormal distribution and power-law distribution. The user activity span is found to be exponentially distributed and the number of these three users’characteristics approximately follows the lognormal distribution in the different active spans. Furthermore, it is observed that these users’characteristics growth rates follow lognormal distribution and are independent with users’characteristics. This phenomenon is consistent with the double Pareto lognormal distribution model and can explain the formation mechanism of the use characteristics distribution. Moreover, the users’number of different growth patterns can be counted using the K-means clustering algorithm, which is based on the vector cosine similarity. The growth patterns of user characteristics are observed by cluster analysis of the actual time series, which are grouped by different sorting methods and initial scales. It is observed that the users with higher growth rate are mainly in explosive growth pattern, and the users with higher initial number tend to be in sustainable growth pattern. Based on the analysis of the explosive growth process of the number of followers, the relationships between the growth of the numbers of retweet and comment are compared, and the reasons for the explosive growth of the users are proposed. Finally, another significant finding is that the distribution of cumulative sum of followers, friends, and statuses follow a strict power-law form, which indicates an allometric growth phenomenon.3. We study the nodes centricity characteristics and identify the most influential nodes for spreading dynamics. First, two bidirectional user relationship networks were established base on actual data from Sina Weibo. By analyzing the statistical characteristics of the network topology, we find both of them have a small world and scale free characteristics. Moreover, we describe four network centrality indicators, including node degree, closeness, betweenness and K-Core. Through empirical analysis of four centrality metrics distribution, we find that the node degrees follow a segmented power-law distribution; betweenness difference is most significant; both networks possess significant hierarchy, but not all of the nodes with higher degree have the greater K-Core values; strong correlation exists between the centrality indicators of all nodes, but this correlation is weakened in the node with higher degree value. Finally, the two networks are used to simulate the information spreading process with the SIR information dissemination model based on infectious disease dynamics. The simulation results show that there are different effects on the scope and speed of information dissemination under different initial selected individuals. We find that closeness and K-Core can be more accurate representations of the core of the network location than other indicators, which helps us to identify influential nodes in the information dissemination network.4. We present an evolution model based on community structure and mixed connection mechanism in Microblogs networks. Based on the user profile data collected from microblogs, we find that the number of microblog user bidirectional friends approximately corresponds with the lognormal distribution. Furthermore, we builds two microblog user networks based on real bidirectional relationships, both of which have not only small-world and scale-free but also some special properties, such as double power-law degree distribution, disassortative network, hierarchical and rich-club structure. Moreover, by detecting the community structures of the two real networks, we also find their community scales follow an exponential distribution. Based on the empirical analysis, we propose a novel evolution network model with mixed connection rules, including lognormal fitness preferential and random attachment, close neighbor interconnected growth in the same community, and global random associations in different communities. The simulation results show that our model is more consistent with real networks. By adjusting the parameters of model, we can generate simulation networks with different degree distributions and clustering coefficients.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络