

Research and Analysis on the Influence of Micro Blogging Users Based on Micro Blogging Data

【作者】 沈崇玮

【导师】 王柏;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 近年来,随着互联网的飞速发展,网络已经成为人们日常生活中获取信息的主要渠道。微博作为近年来快速发展起来的网络新兴媒体,已积累上亿用户。微博平台包含信息量大,信息更新速度快,常常使用户淹没在信息的海洋,帮助用户找到影响力大的用户所发表的微博信息具有重要意义。微博平台推出的检索功能是帮助用户找寻微博信息的良好途径。传统的信息检索包含相关性,权威性,时效性三个关键因素。微博平台由于内容更新快速,发表内容用语不规范,所以时效性和权威性往往具有更加重要的意义。本文的影响力分析也是对权威性的研究。本文利用微博数据,对用户的影响力进行分析研究,主要成果包括以下内容:1.微博数据的获取。本文研究初期,从微博平台抓取大量用户数据,包括用户的详细信息,用户关注关系,回复转发关系等。这部分数据是本文研究的基础工作,也可作为微博其他研究的基础数据。2.本文对于微博用户影响力的研究,目标是识别用户在不同领域的不同影响力。本文从用户发表的微博内容及用户之间的关注关系对微博用户所属领域进行划分,并得出用户在各个领域的权重。通过半自动的标注样本验证,该划分方法具有比较准确的效果。3.本文在对用户发表的微博内容做文本分析的同时,通过并行的新词识别算法识别微博内容中的新词,并利用搜索引擎的相关搜索对重要文本特征做语义扩展,解决了微博文本内容短小,特征稀疏,无意义特征过多,有区分度的特征较少等一系列问题。4.本文利用用户在不同领域的分类权重,基于用户间的回复和转发微博关系,构建领域相关的影响力传播模型,经过对比验证,该方法具有不错的效果。

【Abstract】 In recent years, with the rapid development of the Internet, the influence of traditional media such as television, newspapers and radio has been gradually caught up with by new media from the internet. Internet has become the main channel that people used to send and receive information in the daily lives. As a fast developing new media on the Internet, micro blogging has accumulated hundreds of millions of users. The micro blogging platform contains a large amount of information and the update speed of the information is fast so that it often makes users could not find the information they need. It is important to help users find the information which was sent by people who have a great influence. Micro blogging content search system launched by micro blogging platform is a good way to help users find the micro blogging content from large amount of information. Traditional information retrieval system have three key factors which are relevant, authoritative and timeliness. The content of the micro blogging platform published and updated is very fast, and the content is not standardized, so that the timeliness and authoritativeness tend to be more important. The analysis of micro blogging users is also for researching the authoritative of users.In this paper, I study on micro blogging users’ influence by using micro blogging data, and the major achievements is as follows. The crawling of micro blogging data. In the beginning of my research, I crawled a large amount of micro blogging data from the micro blogging platform, including detailed information of the users, users’ concerned relationship, reply and repost relationship. These micro blogging data are the basis of my research of this paper and it can also be used as the basic data of other micro blogging relative research.The purpose of research on influence of micro blogging users in this paper is to identify the influence of different users in different areas. In this paper,I divided the users in different areas by using two different features which are the content of the micro blogging and the concerned relationship of the users. During this research, I also calculate the weight of the micro blogging users in different areas. The classification of the micro blogging users has high accuracy by some semi-automatic annotation sample validationDuring I did the text mining on the micro blogging content, I used the parallel new word recognition algorithm to recognition the new words in the micro blogging content and used the relative search of the searching engine to do semantic extensions on some important text feature. By using these, we solved a series of questions which are only belong to the micro blogging content such as text content is short, the features vector are sparse, and too many meaningless features.In this paper, I build a topic relative influence propagation model based on the reply and repost relationship between micro blogging users by using the users’ different classification weights in different areas. The propagation model has a good performance by doing some contrast experiment.
