节点文献
基于协同过滤技术的推荐方法研究
Research on Recommendation Methods Based on Collaborative Filtering Techniques
【作者】 郁雪;
【导师】 李敏强;
【作者基本信息】 天津大学 , 信息管理与信息系统, 2009, 博士
【摘要】 随着网络与信息技术的飞速发展,互联网为用户提供越来越多的信息和服务,Web用户面临的信息超载问题日趋严重。面对海量的网络资源,推荐系统能够及时跟踪用户的需求变化来自动调整信息服务的方式和内容,是一种极具潜力的解决信息超载的个性化服务技术。协同过滤技术是推荐系统中最广泛使用和最成功的技术之一,在理论研究和实践中都取得了快速的发展。但是随着用户数量和系统规模的不断扩大,协同过滤推荐技术将面临严重的数据稀疏性、超高维、冷启动和实时推荐等方面的挑战。本文针对这些问题,应用统计学和数据挖掘理论与方法,定义了新的相似性度量方法,提出了更为有效的协同过滤推荐算法,给出了电子政务信息推荐系统的体系结构设计,并对多种推荐策略做了详细的探讨。主要研究成果如下:1)详细分析了协同过滤技术中数据稀疏性问题,针对传统相似性度量会导致预测误差的问题,提出了一种新的混合协同过滤推荐算法框架。新算法首先引入新的相似性度量方法,利用站点的概念层次结构,综合考虑页面之间的主题相似性。然后在新相似度的基础上通过基于项目的协同过滤方法预测原始兴趣矩阵的空白评分项,缓解矩阵的稀疏性,最后在平滑后的矩阵上为用户产生推荐。实验对真实Log日志数据进行了测试,结果证明了该算法在提高推荐质量方面的有效性。2)深入研究了推荐系统中原始评分矩阵的超高维特性,提出了一种基于局部主成分分析(Local Principle Component Analysis)的协同过滤推荐方法,该算法首先根据站点的领域知识对网页进行按主题分类,使同一主题的页面具有较强的内容相关性,然后每类页面分别进行主成分变换,实现数据的降维预处理。对每类主题页面设置用户的兴趣阈值,算法实现了从用户和项目的角度分别进行降维处理,最后的实验结果显示了基于局部主成分分析的降维方法可以显著的提高预测精度,但对训练数据集的稠密性具有较高的要求。3)针对推荐系统中实时性和推荐质量不能同时兼顾的问题,提出了一种使用维数约简和聚类技术的混合推荐算法。该算法首先对高维的原始评分矩阵进行全局降维,在低维空间上使用聚类技术缩小目标用户的最近邻搜索空间,使推荐算法的在线计算量大大减少,算法对标准数据集和真实数据都进行了实验测试,结果表明在改善实时推荐效率的同时,算法也具有较高的评分预测精度,并且不同的数据集对推荐结果的影响也是比较明显的。4)针对目前电子政务系统中个性化信息服务的需求,提出了一种面向公众的信息推荐服务模型,设计了个性化推荐系统的体系结构,探讨了其主要功能模块和实现的核心技术,并且根据目前电子政务门户网站的数据特点和服务对象采用了多种推荐策略,能够有效的满足不同情况下的推荐需求。
【Abstract】 With the rapid development of network and information techniques , web applications can offer more and more information and services than ever before. As a result, users suffer a lot from information over-load problem. Within the huge amount web resources, recommendation system is a potential personalized technique for solving the above issues, which can adjust the content and means of services by tracking users’interests. Collaborative filtering is one of the most widely used and successful methods for recommendation, which has been made fast development in theoretical research and applications. However, collaborative filtering has got challenges, such as data sparsity, high dimensions, cold start, and real-time recommendation issues with the fast growth in the amount of users and items. In this work, by employing applied statistics methods and data mining techniques, a new similarity computation was defined and improved collaborative filtering algorithms were proposed to refine the quality. We also explored an architecture design for building E-government information recommendation system, as well as a deep discussion for the variety recommendation strategies. The main contributions of this dissertation are as follows:1. Detailed analyzed the data sparsity issue in collaborative filtering. To address the prediction inaccuracy problem caused by traditional similarity methods, we explored a new hybrid CF approach which improved the similarity coefficient computation by combining the semantic similarity of web pages with taxonomy. The missing rating values were predicted with new similarity based on the item-based CF, which alleviated sparsity of the rating matrix. Pages will be recommended based on the smoothed one. The experiment results showed the proposed algorithm can significantly improve the prediction accuracy in terms of the real web log dataset.2. Deeply researched for high dimensions issue of the rating matrix in recommendation process, and an effective local principle component analysis method was proposed. We firstly clustered the web pages based domain knowledge of the website, which caused strong inner correlation among the same class. After that, in each class several features were retained by PCA methods. Moreover, we set a threshold value to determine the user numbers for each class. This algorithm made dimensionality reduction by considering both users and items aspects. The experiments carried out on real datasets suggested that the new method can provide better recommendation quality than traditional collaborative filtering, although it required the training datasets with relatively high density.3. According to previous research results, high salability would worse the prediction quality. To address the issue, we explored a new hybrid recommendation model which combined dimensionality reduction and clustering methods. In our approach, the clusters were generated from the relatively low dimension vectors space transformed by global PCA method in first step. The online computation complexity was decreased by searching the neighbors inside one cluster instead of the whole user space. The experiments indicated that the new algorithm can produce better prediction quality and higher efficiency compared with existing algorithms, and datasets with different feature can obviously influence the recommendation results.4. We designed a public-oriented information recommendation model, which was developed to satisfy the personalized information requirements in E-government domain. The advanced recommendation architecture and main functionality modules were described as well as the key techniques needed. Due to the feature of E-government application and its service objects, various recommender strategies were adopted in our design to address the public information needs.
【Key words】 Recommendation System; Collaborative Filtering; Data Sparsity; Dimensionality Reduction; Principle Component Analysis; Cluster Technique;