节点文献

Web信息智能检索研究

The Research on Intelligent Web Information Retrieval

【作者】 韩巍

【导师】 吴国凤;

【作者基本信息】 合肥工业大学 , 计算机软件与理论, 2004, 硕士

【摘要】 随着Web的不断增长,人们对Web信息检索系统提出了更高的要求。Web信息检索也逐渐成了互联网研究中的一个热点。近年来,又有一些学者提出了面向特定主题的Web信息检索方法,以满足一些专业用户的信息需求,同时也克服了综合搜索引擎的一些不足。 本文对面向特定主题的Web信息检索所涉及到的关键技术进行了深入的讨论。对面向特定主题的Web信息检索系统中的网页主题识别方法(网页分类方法)作了深入的研究。目前对网页的分类主要是采用基于网页内容的分类方法,这种分类方法没有充分利用web的链接信息,因而分类效果不是很好。本文给出了一个结合网页链接结构的网页分类方法。同时,在对网页分类技术进行研究的基础上,本文构造了一个基于网页链接结构的面向特定主题的Web信息搜索系统。 最后本文使用vc++6.0开发环境实现了一个实验系统平台,并在这一平台上进行了相关的实验。

【Abstract】 With the increasing of WWW, Web information retrieval systems with higher performance are required. Subsequently, the research on Web information retrieval has being a focus. Recently, Focus Crawling system was presented to satisfy people who need professional knowledge from WWW.In this dissertation all key aspects of a Focus Crawling system are introduced and then the classification problem in Focus Crawling system is deeply discussed. Now, most classification methods for Web Page only use the contents of Web Page. These methods ignore links between pages completely. In fact, links between Web Pages sometimes reflect topics of these linked pages. So this dissertation designs a new method to classify Web Pages. This method uses links and contents of Web Page to decide a page’s class. The result of experiment shows an improvement on methods, which consider contents of Web Page only. Then this dissertation designs a better Focus Crawling system, which use a classifier based on contents and links of a Web Page to decide the page’s class, and the result of experiments shows an improvement on common method.In order to check our methods, we develop a focus crawling system using vc++ 6.0.

  • 【分类号】TP393.09
  • 【被引频次】5
  • 【下载频次】222
节点文献中: 

本文链接的文献网络图示:

本文的引文网络