节点文献
基于用户点击行为的数字图书搜索系统研究与实现
Research and Implementation of Digital Book Search System Based on User Click-through Data
【作者】 袁川;
【作者基本信息】 浙江大学 , 计算机应用技术, 2008, 硕士
【摘要】 数字图书馆(Digital Library)在世界很多国家受到了高度关注,并取得了迅猛发展,已经成为人们获取信息与知识的重要途径。数字图书搜索则是数字图书馆必须提供的支撑性服务,本论文针对数字图书搜索以及搜索结果排序问题做了深入研究与开发,以便读者能够在海量数字图书资源中快速发现他所需要的数字图书。传统数字图书搜索建立在关系型数据库之上,采用关键词的简单匹配来判别相关程度,不能反映图书的质量信息和受关注程度,缺乏有效的综合排序机制,不能综合利用多种排序依据。本文的主要工作如下:一、利用数字图书馆门户丰富用户使用日志数据,提出两个点击流上的随机行走算法:BookRank—基于访问关联图的图书评分算法,提供图书相关性排序功能;OueryCluster—基于查询-阅读行为的查询词聚类算法,利用读者对检索结果的隐式反馈信息,提供对查询词的聚类功能。二、抓取互联网上的图书评分相关数据,将其整合进我们的图书搜索排序系统中去作为搜索结果排序的一个重要依据。三、在查询词聚类的基础之上,实现一种多排序依据集成方法,针对每类查询词,综合利用从访问关联图得出的图书相关性排序、互联网上的图书评分以及文本相似度这三种信息源,形成最终的搜索结果排序。四、开发完成相应的数字图书搜索系统,部署在高等学校中英文数字图书合作计划(CADAL)的网站上,根据用户在实际使用中的反映,与传统数字图书搜索相比,新搜索系统的搜索结果排序更加合理。
【Abstract】 Many developed and developing countries over the world have put large efforts on the development of digital library since the mid 1990’s. Digital library has become an important means for people to access desired knowledge and information. Digital book search is the sustainable service that digital library should provide. This paper exclusively focuses on the development of digital book search and explores in depth the problem of search results ranking, so that the visitors of digital library can quickly find books satisfying their needs in the massive book resources.The traditional digital book search is based on the matching techniques of relational database. It can only find out the relevant book entries which contain the keywords the reader entered. Moreover, it lacks effective book ranking mechanism to sort results of relevant books, and ignores the popularity and quality of these books.The main work of this paper is summarized as follows: 1. Extract behavior information of user clicking on books out of the access logs, construct Correlation Graph of books read by users, and use random walk algorithm to rank the books by relevance. 2. Extract query words and book reading records out of the access logs, and utilize the clustering effect of random walks to cluster query words. 3. Crawl book score data from well-known online bookstores on the Internet, which act as another important measure for book ranking. 4. Propose an approach to integrating multiple book ranking infonnation for each class of query. The final ranking list of results of book search is gained by fusing text similarity, book score data from online bookstores and book ranking from Correlation Graph. 5. We have developed a digital book search system and deployed it in the CADAL portal using the above algorithms and techniques. Users have reported that the new book search system provides the more reasonable ranking of search results compared with the original book search module.
【Key words】 Digital Library; Correlation Graph; BookRank; Query Clustering; Ensemble of Multiple Information Sources;
- 【网络出版投稿人】 浙江大学 【网络出版年期】2008年 07期
- 【分类号】TP311.52
- 【被引频次】4
- 【下载频次】230