节点文献

Web信息获取技术研究与实现

【作者】 张驰

【导师】 吴健;

【作者基本信息】 西北工业大学 , 计算机软件与理论, 2001, 硕士

【摘要】 WWW已经发展为拥有近一亿用户和大约400万站点,3亿页面的巨大分布式信息空间,且其信息量仍以指数形式在飞速增长。但由于它上面的信息具有开放性、动态性和异构性使得人们很难快速地从WWW上获得所需信息。搜索引擎的出现使人们从大量信息资源的集合中找到与给定查询请求相关的、数目恰当的资源子集成为可能。本论文的主要工作是对搜索引擎技术进行研究并对中文搜索引擎加以设计和实现。 本文首先对搜索引擎的发展现状进行了介绍,对搜索引擎的工作原理及关键技术进行了分析,然后说明了中文搜索引擎的设计与实现并介绍了实现中用到的相关技术。 本文的主体是使用Java语言和多线程技术对搜索引擎进行设计与实现,在实现网页数据库时采用了JDBC技术,同时对中文信息处理也进行了一定的研究。 本文最后总结了完成的工作,以及日后可以改进和完善的地方。

【Abstract】 WWW has developed into a gigantic distributed cyberspace, having almost a hundred million users and nearly 4000 thousand sites, with three hundred million webpages. And at the same time, the information is increasing rapidly in exponential rate. But due to its attributes of openness, dynamic and inhomogeneousness, it is difficult to get the specific information from the net quickly. It is Search Enging that makes it possible to find the pretty number of resource subsets related to the given query. The main work of this thesis is to study the Search Engine Technique and to design and implement one search engine for Chinese. In this thesis, the present situation of search engine is introduced firstly. And then, the working theory and key techniques are analyzed. In the end, the design and implementation of search engine together with the pertinent technology are explained. Java language and multi-hread technique are used to plan and develop the search engine. JDBC technique is adopted in accessing the webpages database. Moreover, the research of manipulating Chinese information is carried out in this thesis. In conclusion, the work having been done and being to be perfected are put forward.

【关键词】 搜索引擎URLrobotJDBC线程
【Key words】 search engineURLJDBCthreadrobot
  • 【分类号】TP393.03
  • 【被引频次】4
  • 【下载频次】262
节点文献中: 

本文链接的文献网络图示:

本文的引文网络