节点文献

个性化技术及其在数字图书馆中应用的研究

Research on Personalization Techniques with Their Applications to Digital Library

【作者】 张寅

【导师】 庄越挺;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2009, 博士

【摘要】 近年来,国内外多个大规模图书数字化计划进展顺利,在大规模数字图书馆环境下的个性化技术研究已成为一个重要的研究方向。作者参与了“高等学校中英文图书数字化国际合作计划(CADAL)”的建设,主要负责研发CADAL百万册图书服务平台,为服务平台实现了图书推荐与搜索应用。本文工作围绕个性化技术研究及其在百万册图书上的应用而展开。针对推荐系统,研究了针对单维度评分以及多维度评分的协同过滤技术:但是在百万册图书服务平台服务过程中,发现读者很少为图书打分,导致基于评分的推荐系统不能够正常工作;为此研发了基于图书点击日志挖掘的图书实时推荐系统,并在个人空间中提供了基于自定义多媒体规则的个性化推荐系统。针对图书搜索,注重设计用户友好的人机交互界面。研究成果如下:(1)提出了针对单维度评分推荐系统的吸收随机行走模型:将单维度评分数据转换成为二部图,引入和每个用户或物品节点相连接的空节点,在增强二部图上运用高斯随机场进行建模,将top-N推荐问题建模成基于图的半监督分类问题,在考虑每个节点的度的情况下推导出一种有效的吸收随机行走模型。在两种真实数据集上的实验结果证明了该吸收随机行走模型的有效性。(2)提出了针对多维度评分推荐系统的两种概率隐含语义分析模型:扩展了著名的单维度评分概率隐含语义分析模型(pLSA),在保留pLSA引入的隐含变量的情况下,采用了两种不同的多元概率分布来建模每个用户的多维度评分。在Yahoo! Movies真实评分数据上的实验结果表明了两种多维度评分概率隐含语义分析模型在预测和推荐任务中的表现显著好于单维度pLSA以及其它对比方法。(3)研发了基于可伸缩紧凑浏览模式树的图书实时推荐系统:提出了红黑头节点树索引的紧凑浏览模式树,该数据结构使用前缀共享树来增量式处理新日志,使用红黑头节点树来显著地提高系统可伸缩性;提出了可伸缩紧凑浏览模式树的构建算法,以及基于该浏览模式树的分治式实时推荐算法。在CADAL服务平台的图书点击日志上的实验结果表明了该方法的有效性和高可伸缩性。(4)研发了百万册服务平台中的图书搜索服务和个性化空间:实现了交互界面友好的多资源库图书统一并行检索系统;研发了支持查询扩展和探索式浏览的图书章节检索系统;为个性化空间开发了基于自定义多媒体规则的个性化推荐系统,读者可以设置图书、图像和书法字三种多媒体规则,系统按照内容相似度以及从日志或用户反馈中挖掘出来的群体阅读倾向,主动推送合适的数字内容。

【Abstract】 Recently, several mass digitization projects of books around the world have made a great progress. Personalization techniques in the context of large-scale digital library have been an important direction to explore. The author has fully participated in the first phase construction of China-America Digital Academic Library (CADAL), as a senior technician in charge of the development of CADAL portal, especially the recommendation and search applications for million books in CADAL.This thesis focuses on the research of personalization techniques and their applications to million e-books in CADAL. With respect to single-criterion and multi-criteria recommender systems, we propose new collaborative filtering methods in a perspective of probabilistic graphical model. However, most of users of CADAL portal have been reluctant to provide explicit ratings for books of interest, so that ratings-based recommender systems didn’t work well in CADAL portal. Hence, the click-through logs of books have been utilized to real-timely recommend relevant books by using sequential pattern mining. Moreover, visitors can customize multimedia rules when visiting their own personal space in CADAL portal, according to which the appropriate recommendations are timely delivered by the rule-based recommender system. With respect to book search applications, the user-friendly HCI interfaces are the major concerns. The main contributions of this thesis are:(1) We propose an effective absorbing random walk model for single-criterion recommender systems. The single-criterion ratings data set is first transformed into a bipartite graph, each node in which is connected to a dummy node. Under the constraint of this augmented bipartite graph, the top-N recommendation task is modeled into a graph-based semi-supervised learning problem by employing the Gaussian random field, from which we derive an effective absorbing random walk model, taking into account the degree of each node. Experimental results upon two real-world ratings data sets show the effectiveness of our proposed model.(2) We propose two multi-criteria probabilistic latent semantic analysis models for multi-criteria recommender systems. The notable probabilistic latent semantic analysis models (pLSA) are extended to deal with multi-criteria ratings. The same latent variable introduced in pLSA is kept in the multi-criteria pLSA, corresponding to user group. However, two different multi-variate probability distributions are utilized to model multi-criteria ratings of each user. Experimental results on Yahoo!Movies multi-criteria ratings data set show that two multi-criteria pLSA models significantly outperform corresponding single-criterion pLSA and other examined methods.(3) Chapter 5 introduces the real-time book recommendation service based on the compact navigation pattern tree indexed by the red-black header tree, in which the prefix tree structure is used to incrementally handle the growth of access logs. The use of the red-black header tree greatly improves the scalability of compact navigation pattern tree. We propose the corresponding construction algorithm and the divide-and-conquer real-time recommendation algorithm based on this scalable navigation pattern tree. Experimental results on the click-through logs in the CADAL portal show the effectiveness and high scalability of our proposed approach.(4) Chapter 6 introduces search services for million books and personal space in the CADAL portal. We developed a novel user-friendly HCI interface for the metadata-based book search service. Moreover, we developed a book chapter search application supporting query expansion and exploratory search. In the personal space, we developed the multimedia rule-based recommender system, in which users can customize three kinds of multimedia rules:book, image and calligraphy character. The rule-based recommender system actively pushes appropriate contents to users according to content similarities as well as collective wisdom mined from logs or user feedbacks.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2012年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络