节点文献

面向大规模社交网络的信息传播模型及其应用研究

Study of Information Propagation Model for Large-Scale Social Networks and Its Applications

【作者】 向彪

【导师】 陈恩红;

【作者基本信息】 中国科学技术大学 , 计算机应用技术, 2014, 博士

【摘要】 近年来,社交网络得到了极大的发展,成为人们消磨休闲时光、获取最新信息、了解朋友动态的重要工具,构成了人们网络在线生活的重要一部分。社交网络的出现带来了各种可能性,如美国总统奥巴马利用社交网络成功赢得两次竞选,韩国音乐人PSY利用社交网络传播其舞曲视频《江南style》突破了10亿次点击的吉尼斯记录。社交网络如何进行信息传播,以及如何利用社交网络推动信息的快速有效传播,成为目前学术界研究的热点。本文系统研究了社交网络上的信息传播模型,其研究成果对于改进社交网络上的信息服务具有重要的应用价值。首先,本文提出一种信息传播的快速计算模型:线性影响力传播模型。信息传播模型是描述社交网络上信息传播过程和进行信息传播计算的基础。尽管近年来对信息传播模型的研究较多,如独立级联模型和线性阈值模型,但这些模型都存在许多缺点,如不可解析,没有闭形式解,效率太低等。这些缺点导致它们只能应用在小规模网络中,然而实际环境中的社交网络(如Facebook、 Twitter等)常常拥有数以亿计的用户规模!为此,本文提出了一种线性影响力模型。该模型能够在线性时间内完成信息传播计算,同时该模型具有可调节性,通过调节模型参数可以使得该模型的信息传播结果非常逼近传统模型。在一个规模达到千万的实际社交网络上的测试表明,线性影响力模型能够严格逼近传统模型的传播效果,而且仅需要付出比传统模型低得多的计算代价。其次,本文基于线性影响力模型提出PageRank的改进算法。PageRank算法是用于计算网络中节点权威值的经典算法。自Larry Page和Seige Brin提出以来,已成为网络研究中最重要的算法之一,也是互联网巨擎Google的立身之本。本文从影响力传播的角度重新分析了这一算法,并得出“网络中节点权威值就是该节点对网络中所有节点的影响力之和”这一本质结论。同时还论证了PageRank算法只是对网络中节点权威值的一种快速近似计算,并给出了多种针对节点权威值计算的精确算法,如规范PageRank算法和先验Pagerank算法等。本文在科学家合作网络上对各种PageRank算法进行了对比,结果表明规范PageRank算法和先验PageRank算法均能取得比传统PageRank更好的结果。然后,本文基于线性影响力传播模型和PageRank算法的本质联系,提出一种用于快速计算任意群体影响力(权威值)的群体PageRank算法。群体PageRank算法是对PageRank算法一种维度上的推广。众所周知,Pagerank算法是计算节点权威值的经典算法,然而如何计算一个群体的权威值呢?如美国民主党和共和党在竞选之时,如何计算两党在民众中的权威值从而预测谁将赢得最终大选?因此,群体权威值计算具有重要的应用价值。本文从影响力传播的角度分析了群体的权威值的计算方法,同时基于影响力传播模型和PageRank算法的关系提出了群体PageRank算法,用于快速计算大规模社交网络上群体权威值。在真实数据上进行的大量测试表明,群体PageRank算法能在常数时间内计算任意群体的群体权威值,同时计算结果能够一致逼近真实结果。最后,本文将线性影响力传播模型和群体PageRank算法用于病毒营销设计,为社交网络上的信息传播提供最佳策略。病毒营销,亦称“口碑效应”,致力于寻找一组社交网络用户为某种产品代言,希望这组用户能在网络上造成多米诺骨牌效应,使得该产品的好口碑能口口相传到到最大规模的网络群体中,从而为产品销售带来最佳效果。本文基于线性影响力传播模型和群体PageRank算法提出了两种病毒营销算法,Linear算法和Bound算法。同时,在多个真实数据上进行的实验测试结果表明,Linear算法和Bound算法均能在时间和效果上胜过当前最新算法。其中,Bound算法甚至只需要常数时间代价就能为病毒营销设计出最佳策略,尤其适合应用在超大规模的社交网络上。

【Abstract】 Recent years has witnessed the great development of social network. Social network has become a necessity for people’s life. They use it to spend spare time, get information and contact with their friends. This creator has brought endless possibility for people’s future life. For example, US president Obama use social network to get the votes for him and win the final campaign; Korea singer PSY use social network to maximally spread his work Gangnam style. How does the infor-mation propagate in social network and how to take advantage of social network to boost the information propagation, have become the hot problems in academia. In this paper, we will systematically research the information propagation model in social network.First, this paper has proposed an efficient information propagation model: linear influence model. Information propagation model is the basis for describing and computing the information propagation process. Although there have been many models, such as independent cascade model and linear threshold model, these models have many shortages. They are untractable, low-efficient, and has no closed-form expression, which make them only applicable to the small scale social networks. However, the real social networks (such as Facebbook, twitter) usually have billions of nodes. To overcome these obstacles, we proposed the linear influence model which is tunable, flexible and efficient. We could use it to approach the traditional models with little effort and complete its computation in linear time. We verified its performance in several large scale social networks, even with millions of nodes.Second, this paper studied the essential of PageRank, and proposed several enhanced PageRank algorithm. PageRank is the classical algorithm for authority computation. This paper re-analyzed this algorithm in the perspective of social influence and found the conclusion "the node authority is actually the sum of its influence on each node of social network". This paper also proved that PageRank is a fast approximation for node’s influence and could be improved by many ways. This paper proposed several enhanced PageRank, such as norm-PageRank, prior-PageRank. Moreover, This paper also proposed a general PageRank:influence-PageRank, which is applicable to any social network with arbitrary information transition probability matrix, and proposed a more-general PageRank:group-PageRank, which is used to compute the authority of a group of nodes. Finally, this paper verified the correctness of the conclusion and the advantage of those enhanced PageRank algorithms through experiments.At last, this paper worked on applying the linear influence model and group-PageRank to design viral marketing campaign. Viral marketing, also called as "word-of-mouth effect", targets at finding a small group of influential nodes-giving them free samples of a product-for triggering a cascade of product propagation that people recommend the product to their friends and friends’friends, with the hope that the product will be adopted by a maximum fraction of the social network. This paper proposed two algorithms:Linear and Bound, based on the linear influence model and group-PageRank respectively. We tested these two algorithms in many real data sets. The experimental results showed that both of Linear and Bound perform better than the state-of-the-art algorithms. Especially, Bound could make the optimal strategy in constant time, which is applicable to the social networks with billions of nodes.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络