节点文献

基于P2P的分布式搜索引擎的研究

Study on P2P-Based Distributed Search Engine

【作者】 王文明

【导师】 何丕廉;

【作者基本信息】 天津大学 , 计算机应用技术, 2007, 硕士

【摘要】 互联网的迅速发展导致网络上的信息爆炸性增长,如何快速准确地在互联网上获取有价值的信息变得越来越重要。搜索引擎的出现给用户在互联网上检索信息带来了极大的便利,其快速性和准确性使得搜索引擎成为互联网上最重要和流行的应用之一。然而,当前搜索引擎还存在以下两点不足之处。第一是搜索深度不够,当前搜索引擎通过网络蜘蛛获取互联网上的资源,无法检索用户个人电脑上的共享资源。第二是当前搜索引擎基于关键词和超链接分析进行排序,未考虑用户的反馈信息。本文将P2P技术引入到搜索引擎中,提出了一种基于P2P的分布式搜索引擎模型和一种新的排序算法。本文首先设计了一种基于P2P网络的分布式搜索引擎模型。该模型没有中心服务器,每台计算机称为一个对等点,每个对等点将其资源的索引发布到P2P网络中供其它对等点检索,因此可以检索到用户个人电脑上的共享资源,从而获得更好的搜索深度。本文接着基于此搜索引擎模型提出了一种新的排序算法。这种排序算法以相关度作为排序的基本因素,利用流行因子和友好因子来优化排序。相关度是检索请求与文档的相关性的度量值。流行因子体现了资源在网络中的受欢迎程度。友好因子反映了用户的兴趣。这种排序算法利用用户反馈信息优化排序结果,可以为特点用户提供更准确的结果。

【Abstract】 The rapid growth of Internet leads to explosion of information. How to get valuable information on the Internet rapidly and accurately is more and more important. The advent of search engine provides the users great convenience when they retrieve information on Internet. The rapidness and accurateness of information retrieval makes search engine one of the most important and popular application.However, there are two drawbacks in current search engines. First, the search depth is not ideal. Search engines obtain information on Internet via web crawler, so they cannot get the shared information stored in users’ personal computer. Second, search engines rank pages based on keywords and hyperlink analysis. And the users’ feedback information is not taken into consideration.This paper brings P2P technology into search engine and proposes a model of P2P-Based Distributed Search Engine and a new ranking algorithm. First, the paper designs a model of P2P-Based Distributed Search Engine. There is no directory server in this model. Every computer is as a peer. Peers publish the index of local resource on P2P network to provide search service for other peers. Therefore the shared information stored in users’ personal computer can be retrieved. By this way, the search depth is improved. Then the paper proposes a new ranking algorithm based on the model. The ranking algorithm uses relevance as the basic ranking factor. And use popularity factor and friendliness factor to optimize ranking result. Relevance is the value of query request and document. Popularity factor reflects the resource’s popularity in the network. Friendliness factor reflects the users’ interest. The ranking algorithm utilizes users’ feedback information to optimize ranking result. Therefore more accurate result can be presented to specific user.

【关键词】 搜索引擎对等网络JXTA分布式哈希表Lucene
【Key words】 search enginepeer-to-peerJXTAdistributed hash tableLucene
  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2009年 04期
  • 【分类号】TP391.3
  • 【被引频次】4
  • 【下载频次】226
节点文献中: 

本文链接的文献网络图示:

本文的引文网络