节点文献

基于Hadoop的Slope One及其改进算法实现

Implementation of Hadoop-Based Slope One and Its Improved Algorithm

【作者】 王毅

【导师】 楼新远;

【作者基本信息】 西南交通大学 , 计算机应用技术, 2011, 硕士

【摘要】 用户推荐系统是一种通过分析用户的个人喜好,例如用户浏览过或者已经购买的商品的信息,向用户推荐其可能喜欢的项目的智能系统。它可以在一定程度上帮助人们在海量信息中寻找自己喜欢的内容。用户推荐系统的核心是个性化推荐技术,现在比较成熟推荐技术主要基于协同过滤算法。但由于用户兴趣的不稳定性和模糊性,这些方法仍然不能够很好的理解用户喜好,从而影响了推荐的效果。相对传统的基于用户对项目评分的协同过滤算法,Slope One算法简单、高效。但该算法依赖于大量用户对待预测项目的评分,如果对预测项目的评分的用户没有或者较少,就会遭遇“冷启动”的问题。同时Slope One算法只考虑了不同用户间评分的相似性,而没有考虑同一个用户对项目评分的个人习惯,这些都可能对评分预测结果有所影响。为了解决这个问题,引入了项目的内容相似性,考虑了描述项目的关键字语义相似和项目类型相似这两个因素。利用这些相似性去度量项目间的相近程度,并结合用户对其他项目的评分提出了一种基于项目内容相似的Slope One算法。最后在Hadoop平台上,基于MapReduce分布式编程模型设计了一套Slope One及其改进算法的实现方法,并在标准的MovieLens数据集上进行实验。实验结果表明SlopeOne算法随着数据集用户评分记录数量的增加能够改善算法预测的性能。同时加入了项目内容相似因素的新算法可以在一定程度上解决原算法可能出现的预测精度降低的问题。

【Abstract】 Recommendation system is an intelligent system which introduces the items to the users by analyzing the users’personal preferences, such as the goods’information about a user has visited or bought. To some extent it can help people to find what they needed from the vast amounts of information. The core of the user recommendation system is the personalized recommendation technology, whose most mature technology is based on collaborative filtering recommendation algorithms now. However, due to the instability of users’interest, these methods still could not understand what the user like, which affect on the results of recommendation.Compared to the traditional collaborative filtering algorithm based on user ratings, Slope One algorithm is simple and efficient. But it depends on the users’ratings that it will encounter "cold start" problem as predicting items’ratings which are not enough. Moreover Slope One algorithm only considered the similarity between different users, without regarding to the users’personal habits, which may have an impact on the score prediction. To solve this problem, the similarity of the item-content is taken into account, including the semantic similarity of keywords describing the items and item-type-similarity. By using of them to measure the similarity between items, a new Slope One algorithm based on the user’s ratings on other items is proposed.Finally, Slope One and its improved algorithm are both completed over the Hadoop platform by the MapReduce distributed programming model, and test them. The results show that the Slope One algorithm could improve the prediction performance with the amount of records in the data sets increasing. Meanwhile the new Slope One algorithm can improve the accuracy of prediction which mixes the factor of item-content-similarity.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络