节点文献

基于参考文档模型的个性化Web检索研究

Research on Personalized WEB Search Based on Reference Document Model

【作者】 李大任

【导师】 李生; 杨沐昀;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2011, 硕士

【摘要】 随着计算机和互联网的迅速普及,人类进入了信息时代,各种信息资源呈现出了爆炸式地增长。在大量的信息中帮助用户更加准确地找到他们想要的信息就成为了信息检索的重要任务。然而传统的信息检索技术大部分都是基于字符串匹配的,他们已经很难满足用户越来越个性化的需求。为了解决这一问题,本文从个性化的动机出发,尝试了实现个性化搜索引擎的不同的技术,主要分成以下三个方面的研究:(1).个性化潜力分析。在本章中,我们首先从数量的角度证实了在网页搜索引擎的查询日志中不同与其他用户的点击数量要多远于被重复的点击数量。然后我们引入Kappa统计量对在同一个查询下的不同用户的点击的一致程度进行了度量。Kappa值的分布显示用户的点击的一致程度是很难用“一刀切”的网页搜索引擎满足的。最后我们引入了“个性化潜力”指标给出了大概什么样的查询能够从个性化中获益更多。(2).基于参考文档模型的个性化Web检索。本章中我们引入了参考文档模型对用户的历史点击文档进行建模并以反馈的方式个性化不同用户相同查询的搜索结果。我们分别在向量空间和概率空间下对参考文档模型的性能进行了实验。实验结果表明,不论是在向量空间还是在概率空间下,参考文档模型都能够从用户的历史点击的文档中对用户的个性进行很好地建模,并将这种个性很好地融入检索过程当中。(3).基于多信息融合的查询推荐。本章中我们就如何使用查询日志中记录的用户群组的历史来实现个性化的查询进行了研究。具体地说,我们首先通过对美国在线的查询日志的分析验证了将其他查询历史相似的用户的查询进行相互推荐的可行性,然后使用了机器学习算法对多种用户查询历史序列的相似度指标进行了融合,并根据融合后的相似度找出查询历史最相近的用户将他们的查询推荐出来。在搜狗的查询日志中的实验结果证实了这种方法确实能够有效地将相似的用户的查询排在了前面。此外,我们还对基于用户群组的点击推荐进行了一定的探索。

【Abstract】 With the development and wide spread of computer and Internet, men have entered the information epoch. The information resources have grown explosively. Thus, how to help internet users exactly find the information that they want becomes an important mission of the information retrieval. Considering that most of traditional information retrieval techniques are based on string matching, they are hardly able to fulfill the more and more individualized information needs. In order to resolve this issue, this paper confirms the motivation of personalization through query log analysis and tries some methods to provide personal service for web users. In details, this paper makes the following contributions:1. Potential for personalization in web search. In this section, we first demonstrate that there are more clicks which are different from other than those repetitive clicks. Then we employ the statistic Kappa to characterize the overall consistency of users’clicks on the same query. The distribution of Kappa values, together with query submission, further reveal that the consistency level of clicks is hard to be satisfied by one-size-fits-all web search engine. Finally, we calculate potential for personalization to present an overview of what queries can benefit more from individual user information.2. Personalized web search based on reference document model (RDM). In this section, we introduce RDM to build user preference model from the users’clicked web pages and then personalize the different users’search results on the same query through the feedback from the model. We respectively examine the performance of the RDM in the vector space and probabilistic space. The results of our experiments represent that, whether in the vector space or probabilistic space, RDM is able to properly model users’preference and incorporate it into the process of retrieval.3. Query recommendation based on multiple information fusion method. In this section, we conduct research on how to exploit the history of user group recorded in query log to implement the personalized query recommendation. Specifically, we first verify the conjecture that it is proper to recommend the queries issued by a user group who share some common search history with the one to be recommended. Then we propose a query recommendation method which finds the preference related queries through ranking users by the sequence similarity of users’query histories. We investigate various measures for user history similarity and employ RankingSVM to fuse these measures to predict the similarity of users. Empirical experimental results indicate that recommending queries issued by the users who have similar search history can effectively predict the subsequent query.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络