

The Research of Personalized Search Engine Based on Analysis of Click Data

【作者】 蔺继国

【导师】 徐锡山;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 硕士

【摘要】 随着互联网技术在全球范围内的飞速发展,互联网逐渐成为群众发布信息、获取信息和传递信息的主要载体,网络信息呈现一种爆炸式增长态势。人们一方面在享受着互联网带来的方便及丰富的信息资源,另一方面也不可避免地遇到难以快速获取有效信息的问题。搜索引擎作为获取网络信息的一个方便入口,正不断被人们使用和依赖。但是,传统搜索引擎对所有网络用户提供一个统一的入口,对所有用户的相同查询词返回一个相同的结果列表,这个结果列表中仍然包含很多网页,用户感兴趣的信息往往仍然被一些冗余信息淹没。为了深入理解用户的搜索目的,对不同用户提供不同的个性化服务,个性化搜索技术应运而生。然而,个性化搜索技术的研究工作仍然处于一种鱼龙混杂的局面,没有一款商用个性化搜索引擎产品提供的个性化服务能够真正让人耳目一新。本文针对个性化搜索技术的现状及问题,基于用户点击数据分析方法对个性化搜索技术进行了深入研究。本文的主要工作有以下几个方面:(1)对现有个性化搜索技术的研究状况进行了分析比较,指出了现有个性化搜索引擎的不足之处。(2)提出一种基于点击数据分析的隐式相关反馈信息提取策略,比显式反馈方法更具有实际应用价值。(3)设计了一种基于添加修正参数的个性化PageRank算法,通过将提取的隐式信息反馈到PageRank中,实现了搜索结果的个性化排序,结果更接近用户的搜索需求。(4)将协同过滤技术应用于个性化PageRank算法,利用兴趣组内其他用户的相关反馈信息来改善同组者搜索结果的排序质量。(5)提出基于兴趣聚类技术的用户分组方法,以实现用户的合理分组,进一步减少用户使用系统时的复杂度。

【Abstract】 With the rapid expansion of information technology throughout the world, Internet has become the main platform of information releasing, exchanging and acquiring. While enjoying the convenience and abundant information bringing by the Internet, people also encounter the problem inevitably that they cannot get efficient information rapidly. As a handy entry for people to gain information, Search engine is used widely and depended on by people.But, the traditional search engines offer only one uniform entrance for all network users, and always return a same result list if given a same query although may queried by different person. The result list contains a lot of information remain, and the information the user interested in may submerged by many redundant things. To understand user’s query motivation deeply, and provide personalized service for different people, technologies of personalized search are put forward and researched.However, research work of personalized search is still in a state that good and evil ones mixed up. And there is no commercial personalized search engine which gives a personalized service that can let us feel new and fresh. Herein the status quo and problem of the personalized search, this thesis proposed a personalized search scheme based on analysis of click data. The main contents are as follows.(1) Gave an analysis on related technologies of personalized searching, and then put forward the weakness and problems of the personalized search engine nowadays.(2) Proposed an integrated strategy which extracts implicit relevance feedback by analyzing users’click data. It has much more value in actual application than explicit feedback.(3) Brought forward a personalized PageRank algorithm based on adding amendatory vector, and put the implicit relevance feedback which was extracted from click data into the algorithm, then implemented a personalized ranking method of searching result.(4) Used the collaborative filtering into personalized PageRank algorithm, and improved the quality of the searching result ranking by using relevance feedback of others’in the group who owns similar interests.(5) Proposed a method of classifying users based on clustering basal users’interesting, so as to implement the reasonable grouping of users, and decrease the complexity of the system.
