节点文献
基于Nutch的移动WEB搜索系统的研究与实现
Research and Implementation of Mobile Web Search System Based on Nutch
【作者】 高梦娇;
【导师】 吕玉琴;
【作者基本信息】 北京邮电大学 , 电子科学与技术, 2013, 硕士
【摘要】 随着3G时代的到来,移动电话,便携计算机等移动设备的普及,越来越多的用户使用移动终端就能够便捷的访问网络。这样用户对于个性化和智能化搜索引擎的需求更加明显。现有的移动终端的搜索引擎,大都是直接把本地搜索引擎转移到移动终端。这些移动搜索引擎仅仅利用纯粹的文本相关度进行搜索,甚至把用户输入的位置信息也当做普通的文本关键字,并没有很好的和用户地理位置等移动空间信息结合起来,而人们在使用移动设备搜索时大多数需求都与空间位置密切相关。移动用户进行搜索查询时,一般希望搜索引擎不仅可以提供与查询内容密切相关的网页,而且可以提供与用户所在位置空间距离相近的网页。因此,现有的移动搜索引擎很难使用户获得理想的查询结果。本文针对移动搜索引擎所面临的问题入手,研究同时基于文本相关性搜索和地理位置相近性搜索的解决方案,提出了一个基于Nutch的移动WEB搜索系统的实现方案,搭建了一个基于位置和关键字双重搜索的移动WEB搜索系统,实现了位置相关的空间搜索。根据网页所描述内容的地理位置信息对网页进行地理标记,该方案可以搜索到与用户所在位置相关的网页,可以用于解决移动用户搜索附近相关性结果的难题。通过使用Lucene和R-tree的混合索引,系统实现了对搜索排序结果的有效优化,验证了混合索引结构能够更快速的为用户提供综合文本相关和距离相近性的结果。本文阐述了整套系统的整体框架结构设计和各个主要模块的实现细节,详细介绍了网页预处理模块,索引建立模块和搜索模块的各个关键技术,包括对网页进行地理标记,基于文本聚类的混合索引插入算法,以及节点优先队列的搜索算法。最后,在功能方面和性能方面对系统进行验证测试。测试结果表明,移动WEB搜索系统具备了综合地理位置和文本信息的双重搜索功能,并具备较好的性能。
【Abstract】 With the popularity of the3G technology, mobile phones, portable computers and other mobile devices are becoming more common. More and more users can able to access the internet via mobile terminals conveniently. Thus, users have a more clear demand to get an intelligent and personalized search engine. The existing mobile search engines are mostly directly transferred from the local search engine. These search engines can only be used to search text relevant result, since they just regard the position information input by user as a normal text keyword. They can’t combine themselves with user’s location and other mobile information.However, mobile usersalways need to search some location related results. When they search a query, they hope they can get both text-related and location-closed web pagesfrom the search engine. Therefore, the existing mobile search engine can hardly provide ideal search results for mobile users.This paper is aiming to resolve this problem for mobile users and mainly research the resolution to get both text-related and location-closed web pages. This paper proposes a space search method to get the location-closed web pages by geotagging all webpages according to web pages’ description location in advance. Eventually, this paper implements a mobile WEB search system based on the existing open search engine-Nutch. This paper proposes a hybrid index structure based on Lucene and R-tree, as well as a "Node Priority Traversal Algorithm"which is corresponding to the hybrid structure. The mobile WEB search system uses this hybrid index to index both location and text content of web pages, and then uses the "Node Priority Traversal Algorithm" to give out located and text-related results to mobile users.This paper firstly describes the overall framework and structural design of the mobile WEB search system. Then the paper introduces the implementation details about each module, including geo-tagging in the web page preprocessing module, cluster enhancing hybrid index in the indexing module, and "Node Priority Traversal Algorithm" in the searching module. After that, this paper evaluates the function and performance of the mobile WEB search system. Finally, this paper proves the system can provide both text-related and location-closed web pages for mobile users and have a good performance.
【Key words】 mobile WEB; search engine; Nutch; Lucene; R-tree; hybrid index;