节点文献

电子商务推荐系统中协同过滤瓶颈问题研究

Research on the Bottleneck Problems of Collaborative Filtering in E-Commercr Recommender Systems

【作者】 李聪

【导师】 梁昌勇;

【作者基本信息】 合肥工业大学 , 管理科学与工程, 2009, 博士

【摘要】 随着Internet和电子商务的迅猛发展,人类已经进入信息社会时代。我国的电子商务市场发展潜力巨大,同时保持了持续高速增长势头。人们通过访问电子商务网站,可以享受足不出户选购商品的快乐和方便。但是,电子商务网站提供的大量商品对用户造成了“信息超载”,导致电子商务网站面临这样一个严峻的问题:如何在用户浏览网站时将适合该用户的商品推荐到他/她面前,克服信息超载带来的不利影响,从而促成更多的交易以增加企业销售额?电子商务推荐系统(E-commerce recommender systems)就是解决信息超载问题的一种方案、一种实现电子商务网站“一对一营销”战略的技术,可作为网站客户关系管理的有益组成部分,已经在许多大型网站得到应用。协同过滤是目前电子商务推荐系统中广泛使用的、最成功的推荐算法,但还存在诸如稀疏性(sparsity)、冷启动(cold-start)、可扩展性(scalability)等制约其进一步发展的瓶颈问题。因此,需要对上述协同过滤瓶颈问题展开进一步研究。本文的主要研究内容如下:(1)对协同过滤的国内外研究现状进行了全面的梳理和综述,在此基础上对协同过滤瓶颈问题进行了提炼。(2)针对基于项目评分预测的协同过滤推荐算法在缓解稀疏性问题上的不足,即目标用户最近邻搜寻不够准确和存在不必要计算耗费,首先提出了非目标用户类型区分理论,从而将用户评分项并集中的非目标用户区分为无推荐能力和有推荐能力两种类型。对于无推荐能力用户,不再计算其与目标用户的相似性以提高算法效率和改善推荐实时性;对于有推荐能力用户,则在其与目标用户存在共同评分项类时,提出了领域最近邻理论对用户评分项并集中的未评分项进行评分预测,从而使最近邻搜寻更加准确。为了防止用户评分数据的极端稀疏现象可能导致领域最近邻的用户相似性过低,进一步提出了一种基于Rough集理论的用户评分项并集未评分值填补方法,该方法能有效实现用户评分项并集的完备化,从而将其应用于评分矩阵的未评分值估算以缓解稀疏性,实现了对领域最近邻理论的有效补充。(3)针对冷启动中的新用户问题,提出了一种冷启动消除方法。首先,提出了用户访问项序理论,通过Web日志来获取用户访问项序,并定义了n序访问解析逻辑,将用户访问项序分解为用户访问子序集,并设计了用户访问项序的相似性计算方法来搜寻新用户的最近邻集合,进而提出了一种改进的最频繁项提取算法IMIEA来对最近邻集合的用户访问项序进行处理,得到面向新用户的top-N推荐;基于最近邻用户与新用户的用户访问项序集合,建立了用户访问项序的Markov链模型,实现了对新用户的商品导航推荐。(4)针对可扩展性问题,提出了一种适应用户兴趣变化的协同过滤增量更新机制,能够以较小的系统计算量在用户提交新评分后实时更新相应项目与其它项目之间的相似性数据,从而消除了传统方法在每次进行推荐计算时无法避免的扫描全体项目空间的计算耗费,有效改善了可扩展性;同时,由于这种增量更新机制保证了在推荐运算中能够使用到最新的用户评分数据,因此使得推荐服务可以适应用户兴趣偏好的动态变化,从而弥补了传统的离线计算项目相似性方法难以反映用户兴趣漂移的不足。(5)在本文提出的上述理论和方法基础上,设计并实现了一个电子商务协同过滤原型系统ECRec(E-Commerce Recommender system),该系统具有良好的可移植性、可维护性及开放式架构(open architecture)特征。

【Abstract】 With the rapid development of Internet and E-commerce, human society has been step into information era. The development potential of Chinese E-commerce is enormous, and it keeps a continuously high-speed increasing. People can enjoy the happiness and convenience of purchasing products via E-commerce websites at home. However, the tremendous products category, which supplied by E-commerce websites, brings“information overload”to users. Hence, E-commerce websites faces a serious problem: how to recommend appropriate products for browsing users to overcome the detrimental effects of information overload and promote more transactions for boosting the sales of websites?E-commerce recommender systems are one scheme to settle information overload, and one technique to realize“one-to-one”strategy of E-commerce websites. It has been applied in many large-scale websites by being treated as a helpful part of customer relationship management for the websites. Collaborative filtering is the most successful and widely used recommendation algorithm in E-commerce recommender systems currently. However, there exist some bottleneck problems in collaborative filtering, such as sparsity, cold-start and scalability. These bottleneck problems limit the development of collaborative filtering, hence we should deeply study on the problems.The main research works of this paper are as follows:(1) On the basis of a comprehensive overview on the research of collaborative filtering at home and abroad, a summary on the bottleneck problems of collaborative filtering is given.(2) To address the drawbacks of item-rating-prediction collaborative filtering algorithm in alleviating sparsity, namely that the searching of nearest neighbor is not accurate enough and there exist some unnecessary computing cost in the algorithm, the non-target users differentiating theory is proposed at first, thus the non-target users in the union of user rating items are classified into two types types, one without recommending ability and the other with recommending ability. For the former users, the user similarity is not computed for improving real-time recommendation; for the latter users, , the domain nearest neighbor theory is proposed and used to predict missing values in the union of user rating items when the users have common intersections of rating item classes with target user. To avoid the possibility that the extreme sparse user ratings could make the user similarity of domain nearest neighbor too low, a rating prediction method based on rough set theory is proposed to estimate missing values in the union of user rating items. This method can realize the completing of the union of user rating items effectively, so it can be used in the evaluating of the missing values in rating matrix for alleviating sparsity. It is an effective complementation for the domain nearest neighbor theory.(3) To solve the“new user problem”in cold-start problem, a cold-start eliminating method for new user is proposed. Firstly, the user-access-item sequence theory is proposed. The items access by user can be obtained via web logs. Secondly, an“n-sequence access analytic logic”is proposed to decompose user’s access item sequence to user access sub-sequence set. Thirdly, a similarity measure for user access item sequence is proposed to search a new user’s nearest neighborhood. Fourthly, an improved most-frequent item recommendation extracting algorithm is proposed to process the user-access-item sequence of nearest neighborhood to obtain the top-N recommendation for the new user. On the basis of the user-access-item sequence set between the new user and her/his nearest neighborhood, a Markov chain model is proposed to realize the products navigation recommendation for the new user.(4) To solve the scalability problem, an incremental updating mechanism of item similarity which suits for online applications is proposed. After the submitting of one new rating by active user, recommender system will finish the real-time updating of item similarity among target item and other items. Hence, the scalability is efficiently improved by eliminating the unavoidable computing cost of conventional method to scan total item space; simultaneously, due to the proposed incremental updating mechanism promises that the newest ratings can be used in recommendation computing, then user interest changes can be integrated in the recommendation service, thus the drawback that traditional off-line computing of item similarity hard to reflect user interest changes is remedied.(5) On the basis of the above proposed theories and methods, an E-commerce recommender prototype system, called ECRec, is designed and realized with better portability, maintainability and the characteristics of open architecture.

  • 【分类号】TP311.52
  • 【被引频次】53
  • 【下载频次】2539
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络