节点文献

一种基于元数据的搜索引擎的设计与实现

【作者】 廖程锋

【导师】 罗三定;

【作者基本信息】 中南大学 , 计算机应用技术, 2004, 硕士

【摘要】 Internet的发展使得互联网成为一个巨大的信息库,但是信息的获取质量却停滞不前。传统的搜索引擎大都基于关键字机械匹配,因而不具备理解文档内容的能力,也导致查准率普遍不高。为此本文提出一种新的基于元数据和RDF的搜索引擎模型。元数据是描述数据的数据,而RDF是一种携带元数据的很好的工具。由于计算机可以理解RDF描述和携带的元数据的含义,因此可以做到基于内容的精确检索。该模型包含词汇集设计、RDF描述生成工具、运行于服务器端的RDF描述信息收集和解析程序、基于词汇集的查询四个模块。词汇集定义了从哪些角度去描述资源;RDF生成工具帮助用户建立对网络资源的描述,或者将RDF描述以XML数据岛的形式嵌入网页中,或者直接将RDF描述文档发往搜索引擎服务器的RDF文档缓冲区;RDF收集和解析模块则负责在网络上寻找被RDF描述过的网页资源,以文本文档的形式存储在服务器的RDF文档缓冲区中,RDF解析器解析该缓冲区中的RDF文档,得到的三元组被存储在索引数据库中,查询模块提供查询界面接受用户检索,并将结果以元数据的方式显示出来。 本系统还研究了受控词表的建立、元数据自动生成等内容,这些研究是对该机制的完善和补充。建立受控词表是为了更好的描述和查询,而元数据自动生成能提高系统的自动化程度。

【Abstract】 The development of Internet makes it a huge base of information, but the quality of information obtainment does not get ahead. The most of traditional search engines are based on matching of key words, so they don’t have capacity of understanding documents on Internet, which result in the low accuracy of searching. This paper provides a new searching model based on Metadata and RDF. Metadata is data about data, while RDF is a good tool that describes and carries metadata. Because computer can understand the meaning of metadata carried by RDF, our searching engine can provide information retrieval based on concept or content. This system includes four modules: design of vocabularies, a tool for generating RDF description, a run-at-server procedure to collecting and parsing RDF description and a B/S procedure for user to retrieve. The vocabulary defines a set of metadata that is used to describe resources. The RDF generating tool help user to describe resource on Internet. There are two ways of description, one is embedding RDF information into web pages in the form of XML Island, the other is directly sending RDF description information to RDF document buffer at the searching engine server; RDF collecting and parsing module’s responsibility is searching web pages which is described by RDF, then storing them to RDF documents buffer in the form of text file, RDF Parser will parse RDF doc in these text files to triples which is stored at index database; finally, retrieve module provides interface for user to retrieve, and display the retrieve result in the form of metadata.In addition, we have researched the controlled vocabulary and the automatic generation of metadata, which is used to perfect the mechanism we provide. Vcabulary can make description and query more convenient, and the latter can make the system more automatic.

【关键词】 XMLRDF搜索引擎元数据网络机器人RDF解析器
【Key words】 XMLRDFsearch enginemetadataweb robotRDF parser
  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2004年 04期
  • 【分类号】TP393.09
  • 【被引频次】5
  • 【下载频次】432
节点文献中: 

本文链接的文献网络图示:

本文的引文网络