节点文献

面向搜索引擎的智能个性化研究

Intelligent and Personalized Research for Web Search Engine

【作者】 徐静秋

【导师】 朱征宇;

【作者基本信息】 重庆大学 , 计算机软件与理论, 2008, 硕士

【摘要】 随着互联网上文档数量的快速增长,在Web搜索的研究方面我们面临着许多新的挑战。搜索引擎上大多数的查询是短小且意义不明确的,即使输入相同查询词的用户也可能有完全不同的搜索意图。目前,大多数的搜索引擎并没有考虑用户个人的需要,对提交相同查询的用户,返回的搜索结果是完全一样的。为了提高搜索质量,个性化的Web搜索已成为信息检索领域的研究热点之一。本文有针对性地重点展开面向搜索引擎的智能个性化研究,不仅充分利用当前流行搜索引擎的优点,如快速响应请求,并且覆盖大量的信息资源等,而且能根据用户不同的兴趣和背景提供相关的搜索结果。其研究的内容主要包括以下几点::①详细分析了现有向量空间模型的词间关系计算方法;基于新的用户兴趣模型,为了有效挖掘各兴趣子类中特征词间的关联关系,本文结合余弦相似性度量和词同现分析,设计了一种新的词间关系计算方法,建立与用户相关的词间关联度量化描述,可用于查询词扩展。②结合浏览行为分析和浏览内容挖掘,准确定位用户查询的兴趣类别;利用兴趣子类中的词间关联度计算,设计搜索词智能语义扩展算法,对用户的初始查询自动增加几个能准确表达其搜索意图的扩展词,一起提交给某大型搜索引擎如Yahoo/Google,进行实际的信息检索。这样的查询扩展方式能使普通搜索引擎实现个性化服务,即对提交相同查询词的用户返回不同的搜索结果。③内容完全重复或近似重复的网页充斥着互联网。搜索引擎的返回结果中也往往包含许多内容重复的网页,它们不但加重了用户浏览的负担,而且降低了搜索服务的质量。本文提出一种基于内容分析的检查相似文档的方法,尤其是对重复文档或近似重复文档的识别。为了进一步提高Web检索的质量,此方法主要应用于对搜索引擎返回的前N篇文档进行去重处理。本文第五章通过实验证明当前工作的有效性和可行性,上述研究在个性化搜索领域中具有一定的学术参考价值和较好的应用价值。

【Abstract】 Along with the amount of Web documents on Internet grows rapidly, we are facing a lot of new challenges in the research of Web search. A vast majority of queries to search engines are short and under-specified and users may have completely different intentions for the same query. Currently, most of the main Web search engines are built to server all users, independent of the special needs of any individual user. In order to improve web search quality, personalized web search has now become to be a focus research in the domain of Web information retrieval. This paper has a further study on it, proposes intelligent and personalized information retrieval research for Web Search Engines. It not only makes good use of the advantages of popular search engines, such as a fast response to user query and a huge amount of information and resources for users, but also can provide relevant search results for people with different interests and background.The main research includes such aspects as below:①In vector space model traditional approaches to calculate terms associations are analyzed in detail. In order to effectively analyze the relation between feature terms in an interest category of a user, this paper proposes a novel algorithm measuring term associations based on user profiles. The algorithm combines cosine similarity measures with co-occurrence data analysis. Quantitative correlation analysis between feature terms relevant with users is built, and servers for query expansion.②A user query can be accurately mapped relevant interest categories in a new user interest model which combines with user’s browsing content and behavior. A personalized query expansion algorithm is proposed by computing the term-term associations according to the current user profile. When the user inputs query keywords, the system can automatically generate a few personalized expansion words, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google. These expansion words help to express accurately the user’s search intention. The new query expansion can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords.③The presence of replicas or near-replicas of documents is very common on the Web. These near-replicas that a search engine returns increase the burden on Web users and decrease the quality of searching service. This paper proposes a method based on content analysis to detect similar pages, in particular replicas and near-replicas. In order to further improve Web search quality, the method is applied to detect and remove replicas and near-replicas in the top N documents, which are returned by a search engine.In section 5, experimental results show the affectivity and feasibility of the present work. The research above has good academic reference value and good applied value in the domain of personalized Web search.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2009年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络