节点文献

基于MapReduce的好友推荐系统的研究与实现

Research and Implementation of Recommendation System Based on Mapreduce

【作者】 杨婷

【导师】 商彦磊;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 随着互联网Web2.0技术的兴起,视频网站、社交网站、微博等得到了广泛应用,用户在上网体验的过程中,产生了大量的数据。面对如此庞大的数据集,信息过量已经成为很多系统面临的问题。从海量数据中找到真正有用的信息,不仅能够帮助用户节省时间,而且还能带给用户更好的上网体验。现有的Web数据挖掘技术应用十分广泛,例如在电子商务中,利用用户购买和浏览的数据,挖掘出用户的购买喜好和购买趋势;社交网站中,通过分析用户的信息、发布的内容、评论等,挖掘出有价值的信息,从而为用户提供更好的服务;利用社交网络用户之间的关系,抽象出社交网络关系图,再通过分析社交网络关系图发掘出潜在的规律等。在这种背景下,本文基于云计算技术提出了使用大规模数据处理算法的用户好友推荐系统,且基于Hadoop平台设计并实现了该系统。本文讨论的用户好友推荐系统由数据采集、数据处理和策略推荐三个部分组成。数据采集模块抓取系统需求的用户数据,如社交网络中用户的id、用户好友的id、用户Follow用户的id等,用户数据存储在HDFS中;数据处理模块,使用并行的处理算法,处理在云计算环境下的海量数据,Dijkstra算法计算被推荐用户到其他用户的距离,PageRank算法计算所有用户在该社交网络中的影响力;策略推荐模块,利用数据处理模块获得的数据进行推荐,以用户影响力作为排序因素对被推荐用户好友的好友进行排序,按照此排序结果进行推荐。基于本系统,社交网站司‘以为用户推荐潜在好友,以增加用户活跃度及用户对社交网络的粘着性;用户可以认识新的好友,扩充自己的人脉,加大用户的影响力。另外,本系统以Twitter数据作为例子进行运算,实际上满足格式要求的数据,都能用本系统进行大规模数据的运算处理。本系统基于Hadoop平台设计,利用MapReduce计算框架实现了推荐算法,能够处理海量的数据集。

【Abstract】 With the developping of Web2.0, video sharing, social networking services, and microblog become popular applications. While surfing the Internet, users leave a large amount of data. Faced with such a large data set, information overload has almost become to a problem which many users will meet, therefore, finding out useful information from massive data, not only can help users save time, but also gives users a better Internet experience.Web data mining has a wide range of usage, in the e-commerce, we can use the shopping data of users to mining the users’buying preferences and buying trends, as for social networking services, we can dig out the potential value through analysis users’information, microblog comments. Relationship in social network can be abstracted to a graph composed with persons and relations, through analyzing the graph we can unearth potential law. In this context, we proposed a recommendation system using large-scale data processing algorithm in cloud computing environment, and the acquisition and processing of data is designed and implemented on Hadoop platform.The recommendation system discussed in this article is composed of three parts, data acquisition, data processing, and the strategy recommendation. The function of data acquisition module is to capture users’data that system required, such as social network users’id, the users’friends’id and followers’id, the users’ information will be handled and be stored in HDFS; data processing module uses large-scale data processing algorithms to processing data under the cloud computing environment, the distance between presentee and other users is calculated by Dijkstra’s algorithm, PageRank algorithm is used to calculate the influence of users in the social network; strategy recommendation module, use the result of data processing module to recommend, the user’s influence is choosen as the factors to sort friends of friends of the presentee.Based on this system, the social networking service can recommend strangers whom users may want to add as friend to users, which can keep users active and spending more time on social network sites; users can meet new friends by taking advantage of this system, and alse increase their influence and expand their contacts. The system takes Twitter’s data as an example while doing experiment, actually the the system can be used for some other large-scale data processing, as the data meet the requirements of the format of the data processing, and this system based on Hadoop platform, which means it has good scalability and can be able to handle big data.

节点文献中: