节点文献

基于知识库的中文网络检索工具——经济信息智能搜索引擎研究

Research on Intelligent Search Engine Based on Knowledge Database

【作者】 薛鹏军

【导师】 侯汉清;

【作者基本信息】 南京农业大学 , 农业经济及管理, 2001, 硕士

【摘要】 本论文试图从计算机技术和图书情报学理论与实践手段出发,应用文献信息自动标引和组织技术于网页的加工处理过程上。在分析了中外搜索引擎的现状与不足,搜索引擎分类主题一体化进展以及网页主要特征的基础上,本文提出了针对中文网页特征的信息标引和组织方案,并利用相关网络技术,构建了一个实验性经济信息智能搜索引擎。 中文网页的自动标引思想主要基于知识库的概念进行。知识库实际上是一个基于《中图法》的专家知识系统,包括了中图法库、汉表库、分类号—主题词对应库、同义词库、关键词库、停用词库和特例词库等若干数据库。在确定网页基本信息标引源的基础上,中文网页主题标引运用了基于词频的统计加权法;通过与分类号—主题词对应库主题词串的词面相似度计算,进一步完成中文网页的赋号标引,即分类标引。 随后,本文利用Borland Delphi、Visual FoxPro等工具设计并开发了一个包括中文网页文本信息提取、自动抽词、自动主题与分类标引、标引结果处理、知识库维护等功能,用以处理中文网页信息的自动标引实验系统;并简要介绍了系统的设计、工作流程、使用方法及运行条件。 根据分类主题一体化发展方向,本文还设计了检索型、目录型及分类主题一体化检索系统,并提出基于集成词表的不同引擎间类目体系的兼容互换方案。 文章的最后对中文网页自动标引系统从系统标引效率、标引准确率等方面进行了综合测评,并客观分析了系统存在的问题和不足。与手工标引相比,自动标引正确率达到了80%以上。

【Abstract】 With traditional informatics theory and practice, the paper try on using automatic indexing technology of documents to process Web pages. Firstly, the present condition and shortages of search engine are described; Secondly, the characteristics of Web pages data are analyzed; Lastly, the author present the indexing scheme of Chinese Web pages and develop a experimental search engine of economic information with network technology.The automatic indexing of Chinese Web pages is based on knowledge database. In fact, the knowledge database is an experiential specialist system, which includes library classification, thesaurus, concordance of class number with descriptor, synonymous dictionary, keywords lists, stop-words lists, etc.After determining the indexing data of Web pages, the method of weighted word frequency, which combined with statistical algorithms, is adopted to exercise the subject indexing of Chinese Web pages. And then, the paper use the measure of literal similarity to classify the Chinese Web pages, which based on lots of experiential classifying data.Then, the author uses Borland Delphi and Visual FoxPro to develop an automatic indexing system, which is used to process Chinese Web pages. The experiential system is composed of Web pages text analysis, automatic words extracting, automatic subject indexing, classifying, indexing result confirmation and knowledge database maintenance. Moreover, the design procedure, workflow, usage approach, running conditions of the system are detailed.According to the trend of integration of classifications and thesauri, the paper also designed keyword retrieval system, directory search system, and integrated system individually.At the end of the paper, the author tests and evaluates the automatic indexing system in some aspects; the deficiency of system is also detailed objectively.

  • 【分类号】G354
  • 【被引频次】18
  • 【下载频次】558
节点文献中: 

本文链接的文献网络图示:

本文的引文网络