节点文献

基于本体的Deep Web语义搜索引擎

Ontology-based Semantic Search Engine for Deep Web

【作者】 谭春亮

【导师】 蒋运承;

【作者基本信息】 广西师范大学 , 计算机软件与理论, 2008, 硕士

【摘要】 随着WWW的迅速发展和普及,WWW成为一个巨大的信息资源库,对这个信息资源库的搜索出现了“信息过载”和“信息迷航”的问题。由于WWW的自治性、开放性、异构性、动态性和指数增长等特点,目录式搜索引擎、全文搜索引擎都暴露出了根本的缺点。基于关键字查询,只检索静态页面,只能进行“导航式”的检索,导致了索引容量指数增长、查全率和查准率不断降低等问题。提高搜索引擎的查全率和查准率,满足用户“知识粒度”检索的要求,同时能够进行语义层面的搜索,成为用户对新一代搜索引擎提出的要求。为了从根本上解决这些问题,新一代的搜索引擎要求必须对WWW进行新的知识表示。万维网的创始人Tim Berners-lee为此提出了新一代万维网的架构—Semantic Web,其上的信息具有良好的定义,使得人与机器、机器间能够更好的实现信息的共享与协作。Semantic Web能够从根本上解决传统搜索引擎所暴露出来的问题。由于WWW的自治性特点,Semantic Web的接受需要一个相当长的时间,并且由于Semantic Web的研究大都停留在理论研究阶段,所以新一代搜索引擎难以实现。本文在新一代搜索引擎和WWW之间找到了一个结合点,将Semantic Web的架构应用到Deep Web的搜索,提出了基于本体的Deep Web语义搜索引擎。基于本体的Deep Web语义搜索引擎可以解决传统搜索引擎只能搜索静态页面,无法进行语义搜索,无法为用户提供“知识粒度”检索的缺点。本文的创新点如下:1、本文基于Semantic Web架构对Deep Web进行语义搜索,解决了传统搜索引擎只能搜索静态页面,无法对Deep Web进行搜索,只能基于关键字搜索,无法进行语义搜索,只对静态页面的内容进行索引,而不能进行元数据索引的缺点,提高了搜索引擎的查全率和查准率,避免了搜索引擎索引容量的瓶颈问题。2、本文通过对Deep Web查询接口进行元数据提取,将查询接口看作后台数据库的元模式,利用元数据描述语言RDF对查询接口进行RDF描述,然后结合领域本体对查询接口的RDF元数据进行RDF检索,从而实现查询接口的语义搜索,提高了查询接口检索的准确率,由于查询接口具有高度的领域相关性,所以提高了搜索引擎的查准率。3、本文提出了基于领域本体的Deep Web语义搜索引擎的框架,由Deep Web爬虫、Deep Web分类器、Deep Web表单提取、自然语言查询接口、语义推理、表单检索器、Web检索器、统一接口查询和结果集成模块组成。在本文中重点分析了Deep Web的发现、分类和查询接口RDF的语义检索,整个RDF检索系统以Jena平台为开发平台,以汽车领域本体和查询接口RDF模型为例进行了验证。4、基于知网的词汇语义关系判断算法以知网做为本体,采用基于结构的模式匹配算法进行词汇逻辑关系的判断;Deep Web特征选择算法采用词汇频度作为类内、类间可分性判据以Tabu搜索策略进行特征选择;Deep Web查询接口RDF提取算法根据查询接口Html代码的特征进行查询接口Html代码和查询接口RDF模型的映射;Deep Web查询接口RDF查询算法以用户输入的关键词序列为检索条件,进行关键词序列的分类操作,概念推理算子操作,得到概念关键词对序列和实例关键词对序列,根据概念关键词对序列采用RDQL语言对RDF进行检索,然后根据检索结果和实例关键词对序列以Http协议格式对Web进行数据检索。本文对上述算法进行了实例验证。本文从理论上对基于Semantic Web架构的Deep Web搜索引擎进行了研究,提出了搜索引擎的大致框架和各关键部分的算法思想,完善了基于Semantic Web架构的Deep Web搜索引擎的检索流程,具有理论可行性,同时结合领域对检索流程和各关键部分的算法进行了实例验证,整个系统可以在Jena平台上开发实现。

【Abstract】 WWW has been a tremendous information depository along with its rapid evolution and popularization. Search on WWW become more and more difficult because information over loading and drift off course on WWW. The shortcoming of directory tree Search Engine and keyword Search Engine is emerged because of autonomy, commonality, heterogeneity, dynamic, openness and increase on exponent. Search like navigation base on keyword only and surface Web make index capability increase on exponent, make recall ratio and precision ratio lower and lower. The new knowledge representation on WWW has become significant to improve recall ratio and precision ratio also satisfy request of user on knowledge granularity search and semantic search. The creator of semantic Web Tim Berners-lee put forward architecture of semantic Web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Semantic Web is capable of solve these problem. This thesis applies the architecture of semantic Web on search of Deep Web to put forward semantic search engine on Deep Web based on ontology. Semantic search engine on Deep Web based on ontology could solve these problem traditional search engine can not solve like searching only surface Web, can not semantic search, can not search on“knowledge granularity”. Four innovations in this thesis are as follows:First: semantic search engine on Deep Web based on ontology makes up traditional search engine’s shortage. For example, traditional search engine could only search surface Web based on keyword, but semantic search and metadata search. Because of these, it could improve the recall ratio and precision ratio, also avoid the restriction on index capability. Second: this thesis represent query interface by RDF metadata. Query interface is pattern of database, so which is described in metadata descriptive language RDF. Search query interface is searched through searching RDF semantically using ontology to make precision ratio higher. Because query interface has high domain pertinence, it make search engine’s precision ratio higher.Third: semantic search engine on Deep Web based on ontology are composed of Deep Web crawler, Deep Web classifier, Deep Web form extractor, NLI (nature language interface), semantic reasoning, form retrieval, Web retrieval, query interface integration and result integration. In this thesis, discovery and classification of Deep Web, semantic search of query interface’s RDF are researched weightily.Fouth: vocable relation computing algorithms uses pattern match based on structure using HowNet as ontology. Deep Web feature select algorithms search feature by Tabu searching strategy using vocable frequency as separability criterion. Deep Web query interface RDF extractor algorithms makes map between query interface html code and model of RDF. Deep Web query interface RDF search algorithms make keyword sequence that user input as search condition to Classify, then Extend Concept to get Concept sequence and Instance sequence. RDF is searched by language RDQL according to Concept sequence. Search on Web is sent in http protocol according to RDF search result and Instance sequence. Algorithms discussed above is validated in this thesisDeep Web search engine based on semantic Web has been investigated theoretically in this thesis. Framework and algorithms thinking of search engine are feasible. The search engine could be developed on Jena. We validate it in domain.

【关键词】 语义Web语义搜索Deep Web本体分类
【Key words】 Semantic WebSemantic SearchDeep WebOntologyClassification
  • 【分类号】TP391.3
  • 【被引频次】7
  • 【下载频次】718
节点文献中: 

本文链接的文献网络图示:

本文的引文网络