节点文献

基于用户行为分析的搜索引擎评价研究

Research on Search Engine Evaluation Based on User Behavior Analysis

【作者】 岑荣伟

【导师】 马少平;

【作者基本信息】 清华大学 , 计算机科学与技术, 2010, 博士

【摘要】 评价是万维网搜索引擎的重要组成部分,是搜索引擎算法改进、系统优化以及日常运营维护的重要保障。传统的评价方式由于大量人力物力资源的消耗,难于满足搜索引擎评价快速全面的要求。如何准确、快速、全面地实现搜索引擎的评价,是急需解决的问题。本文针对万维网用户的信息需求,结合用户行为分析和搜索引擎评价展开相关研究,实现用户行为信息的有效挖掘和搜索引擎快速全面的评价。本文的研究工作包括:(1)对用户行为进行宏观统计分析,包括用户的查询分析和点击分析,挖掘用户行为和信息需求之间的联系。同时,区分用户的查询意图,考察不同信息需求下,用户行为的差异性。(2)针对用户行为中存在的偏置和噪音问题,以及传统方法无法处理长尾查询的不足,提出基于点击粒度的搜索用户行为模型,实现对点击可靠性的评估。实验和分析表明,基于用户思维决策过程导出的行为特征能够区分不同的点击,所提的用户行为模型能够有效实现点击质量的评估,并对长尾查询词有效。(3)结合用户行为分析方法和传统的Cranfield评价体系,构建基于用户行为分析的搜索引擎搜索性能评价的框架结构,实现相关评价系统。同时,针对单搜索引擎用户行为信息存在的不足,提出基于多搜索引擎用户行为信息的MCTR模型,实现对查询的自动标注。相关实验结果表明,该自动标注方法具有一定的准确性,能够完成自动评价搜索引擎结果的任务。(4)针对万维网信息数据规模大的特点,提出了用户访问万维网(User Accessed Web,简称UA Web)的概念。结合用户浏览行为信息,利用蒙特卡洛随机采样过程实现页面数据的均一采样。考察和分析用户访问万维网的特点,以及搜索引擎的索引规模和索引结构。(5)针对搜索引擎评价的具体目标和内容,整合用户行为分析和搜索引擎评价的相关研究成果,提出一个搜索引擎整体评价系统的设计方案,希望能够满足搜索引擎快速、全面的评价要求。

【Abstract】 Performance evaluation is an important issue for Web search engines in terms of algorithm improvement, system optimization, and maintenance. Traditional methods cannot satisfy the request of search engine evaluation due to huge amount of human efforts and an extremely time-consuming process in practice. This paper study user behehavior and mine useful information to evaluation Web search engine’s performance fully and automatically. The contributions of this paper are:(1) Based on interactive process between user and search engine, we present an analysis of user behaviors about querying and clicking, and mined the relationship between user behavior and need. We also analyses different user behaviors for different user need based on several types of query needs.(2) Due to the bias and noisy in user behavior, we propose a behavior model to estimate click reliability. Experimental results show that the proposed features can be separating reliable clicks from other ones, and the model effectively identifies click quality which works well for hot queries and long-tail ones.(3) This paper presents a performance evaluation method under Cranfield framework fully and automatically, and constructs an evaluation system based on user click-through behavior. MCTR model is proposed to eliminate potential and inherent bias. The results show our method produces evaluation results similar to those gained by traditional human annotation.(4) The UA Web is proposed to describe the userful information on the Web and Monte Carlo simulation process is adopted to generate near-uniform sampling page sets. Experimental results reveal some properties of the UA Web and the index profile of four commercial search engines.(5) Considering the goal and content of Web search engine evaluation, we combine the research results about user behavior analysis and search engine evaluation, propose an improved evaluation system to work fully and automatically.

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2011年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络