节点文献

推荐系统中若干关键问题研究

The Research on Key Technologies of Recommender System

【作者】 李涛

【导师】 王建东;

【作者基本信息】 南京航空航天大学 , 计算机应用技术, 2009, 博士

【摘要】 随着Internet的发展,网站在为用户提供越来越多信息的同时,其结构也变得更加复杂,如何及时地在网络上的海量信息中发现所需要的信息已经变得越来越困难。推荐系统一方面通过预测用户对项目的喜好程度来为用户提供信息过滤,应用知识发现技术来生成个性化推荐,帮助用户找到所需信息;另一方面辅助企业达到个性化营销的目的,进而提升销售量,为企业创造更多的利润。此外,加之个性化服务发展与普及,推荐系统在越来越多的Web站点上得到广泛应用,特别是各类电子商务平台中。由于推荐系统具有良好发展和应用前景,已经成为Web智能技术中的一个重要研究方向,受到了众多研究者的广泛关注。近年来,推荐系统在理论和实践中都得到了快速的发展,但是随着所应用的系统规模的进一步扩大,推荐系统也面临着一系列的挑战。本论文对推荐系统中的推荐算法及隐私保护等关键技术进行了有益的探索和研究。本论文的研究内容主要是将数据挖掘与机器学习相关技术应用于推荐系统中,主要涉及推荐系统的实时性、推荐质量和隐私保护等方面的应用研究。本论文的主要研究工作如下:(1)针对推荐系统中数据高维稀疏性的影响,提出了一种基于非负矩阵分解的协同过滤技术,分析及实验都表明,算法能够提高推荐生成速度,满足推荐系统实时性要求。实验还表明,算法能够提高推荐质量。(2)推荐系统中项目数量庞大,用户仅能对其中部分项目进行评价。当用户之间缺少对相同项目评分时,即使他们对相似项目进行评分,系统也不将其视为近邻,这就导致了“相似不相同”问题,影响推荐质量。针对这一问题,我们提出了分层相似性的概念,建立了推荐系统的多层相似性度量。实验表明该相似性度量能够提高协同过滤算法的推荐精度。(3)推荐算法的实时性要求一直以来都是研究者关注的重点内容之一。本文提出了一种基于用户聚类的协同过滤算法,通过离线对基本用户进行聚类,在线时利用已有用户聚类搜索目标用户最近邻,并产生推荐。算法分析表明其能够提高目标用户最近邻的搜索效率,加快生成推荐。通过结合多层相似性度量,实验表明,其不仅能够提高推荐生成效率,而且能够提高推荐质量。(4)信息安全和隐私保护是数据挖掘领域的热点之一。推荐系统需要收集用户兴趣喜好等相关数据,在一定程度上涉及了用户的个人隐私,因而推荐系统中的隐私保护也开始受到研究人员的关注。本文提出了一种基于随机扰动的隐私保护推荐算法。算法在用户数据收集过程中采用随机扰动技术,并使用非负矩阵分解对数据进行处理,从而形成隐私保护功能,并在此基础上产生推荐。通过分析及实验表明,算法在保护用户个人隐私的基础上,能够产生具有一定精确度的推荐结果,以满足推荐系统的需要。

【Abstract】 With the development of the Internet, the web provides more and more information for users, while the structure of the web has also become more and more complex. This situation has made it substantially more difficult for users to find the information they need from the vast amount of materials available on the Internet. The recommender system provides information filtering for a user by predicting the particular user’s preference, and it can apply knowledge discovery techniques to make personalized recommendations to help the user quickly find the desired information. At the same time, the recommender system can enable enterprises to achieve the objective of personalized marketing, which can improve sales and generate more profits. In addition, with the popularization of personalized service, the recommender system is widely used on a growing number of web sites, especially in the E-Commerce platform. Because of its great potential for development and applications, the recommender system has become an important research area in web intelligent technologies and attracted significant attention from researchers.Although the development of recommender system has been successful in both research and applications, a number of challenging research problems still exist. To address these challenges, this dissertation explores and studies some key technologies of the recommender system, such as the design of novel algorithms with better recommendation quality and enhanced privacy protection technology. In particular, data mining and machine learning techniques are incorporated into the recommender system. Technologies for enhanced real-time recommendation, improved recommendation quality and strengthened privacy protection in the recommender system are investigated.The main research results of this dissertation are as follows:First, the performance of collaborative filtering systems degrades with increasing number of customers and objects. To reduce the dimensionality of filtering databases and to improve the performance, non-negative matrix factorization (NMF) is proposed. Theoretical analysis proves that NMF-based collaborative filtering can accelerate the process of recommendation generation to satisfy the demands of real-time recommender system. Experimental results show that NMF-based algorithm can improve the performance of collaborative filtering systems in both the recommendation quality and the efficiency of recommendation generation.Second, the number of objects in the recommender system is generally very large, and individual users are only able to evaluate a small fraction of all the available objects. As a result, the lack of the overlap of objects rated or evaluated by different users can prevent the recommender system from recognizing the otherwise obvious similarity among different objects if each of these objects is rated by a different individual. This is the "similar but not identical" problem which can seriously affect the quality of recommendation results. Therefore, the multi-layer similarity concept is presented and a multi-layer similarity evaluation procedure is established for the recommender system. The experimental results illustrate that the multi-layer similarity evaluation can improve the accuracy and consequently the quality of the recommendations.Third, to overcome the speed bottleneck of collaborative filtering algorithm used for generating recommendations, a collaborative filtering algorithm based on clustering basal users is described. The algorithm separates the process of recommendation into offline and online phases. In the offline phase, the data of basal users are preprocessed, and the basal users are clustered; while in the online phase, the nearest neighbors of an active user are identified according to the basal user clusters, and the recommendation to the active user is generated. During the course of recommendation generation, the multi-layer similarity evaluation is used. Experimental results show that the presented algorithm can improve the performance of collaborative filtering systems in both the recommendation quality and efficiency.Finally, the recommender system operates by collecting rating or evaluation information for objects and matching users who share the same interests or tastes. This is potentially a serious threat to individual privacy because most online systems collect preferences of users which include their private information. So more and more users and researchers are concerned with the privacy protection in the recommender system. In this dissertation, a new privacy protection algorithm is presented. In this algorithm, randomized perturbation techniques are applied during the course of user data collection, and the collected data are further processed by NMF, which enables the protection of sensitive information. The algorithm produces the recommendation based on the privacy protected users’data. Both the theoretical analysis of the algorithm and experimental results demonstrate that the algorithm can not only protect users’privacy, but also generate recommendations with satisfactory accuracy to meet the needs of the recommender system.

  • 【分类号】TP393.09
  • 【被引频次】12
  • 【下载频次】1602
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络