节点文献

基于领域本体的Deep Web不确定性模式匹配研究

The Research of Deep Web Uncertain Schema Matching Based on Domain Ontology

【作者】 高华玲

【导师】 王驹;

【作者基本信息】 广西师范大学 , 计算机软件与理论, 2011, 硕士

【摘要】 随着互联网技术的不断发展,出现越来越多的网络信息资源,如何利用的问题引起广大网民和学术研究人员的关注。根据Web的信息资源的分布和位置特征可以将其划分为Surface Web与Deep Web两部分。传统的搜索引擎仅能检索Surface Web信息,而对于信息量更大、信息质量更好、主题更专一、结构性更强的Deep Web数据库信息却不能有效的爬取。Deep Web信息集成是有效利用Deep Web信息资源的重要手段。Deep Web查询接口集成的研究是信息集成研究的核心内容,有着重要的“承上启下”的作用。目前的查询接口集成研究存在一些问题:中文语义计算不够准确、查询接口模式匹配的方法复杂、时间复杂度较大、模式匹配的不确定性考虑不多等。针对这些缺点和不足,本文提出一种基于领域本体的查询接口集成方法,该方法是一种整体匹配方法,打破了传统的两两匹配方法在效率上的瓶颈,大大简化了匹配的复杂过程。同时提出一种不确定性匹配的选择标准,为不确定性匹配的研究开拓了新的思路。本文主要的研究工作和贡献概括如下:(1)本文重点介绍了本体相关知识并分析了领域本体的组成结构,根据领域本体的构建方法并结合旅游领域相关Deep Web查询接口属性和实例特征,使用更规范的、表达能力更强的本体语言OWL2作为编码语言,构建了面向查询接口的旅游领域本体。(2)本文在深入研究和分析传统模式匹配技术基础上,提出了一种基于领域本体的Deep Web查询接口模式匹配方法,利用该方法实现了对特定领域的大量查询接口的整体匹配,匹配效率上大大优于传统的两两匹配。该方法充分的利用了本体概念与概念之间的语义关系,实现了查询接口在语义级别上的理解。(3)本文对模式匹配中最重要的相似度计算问题提出了一种改进的属性相似度计算方法。该方法用于中文查询接口集成中的模式匹配问题,考虑到中文查询接口属性名称出现的规律和特点,在基于知网的中文语义相似度计算的基础上改进了属性相似度计算的公式。实验证明使用该公式大大提高了计算的准确率。(4)本文对不确定性模式匹配的评价提出了基于属性位置判断属性匹配可信度的观点,并给出了属性匹配可信度量化的计算公式,帮助我们选择更合理的匹配结果。(5)本文实现了基于本体的查询接口集成系统,包括本体管理模块、查询接口预处理模块、相似度计算模块、模式匹配生成模块和查询接口集成模块。在系统实现的基础上评估并验证了本文的关键技术和算法,为实验结果数据的收集创造了良好的平台。最后,通过建立的系统平台,设置相应的实验,对实验结果进行分析与评价,验证了基于本体的模式匹配方法的性能特点和改进的属性相似度计算方法的准确率。

【Abstract】 With the continuous development of Internet technology, more and more network information resources keep appearing, there goes the problem how to utilize the resources which caused the concern of the majority of Internet users and academic researchers’attention. According to the distribution of the information resources, Web position and feature can be classified as Surface Web and Deep Web two parts. The traditional search engines can only retrieve Surface Web information, and cannot crawl the more informative, better qualified and more specific, stronger-structured Deep Web database information effectively.Deep Web information integration is the important means to use Deep Web information resources effectively. Deep Web querying interface integrated research is the core of information integration, which plays the important "transitional" role. The current inquires the interface integration has some problems:Chinese semantic calculation is not accurate enough, inquires interface schema matching method is complex, time complexity is relatively huge, lack of consideration in schema matching uncertainty, and etc. According to these shortcomings and the insufficiencies, this paper proposes a method based on the domain ontology querying interface integration method, this method is a kind of whole matching method, which has broken the low efficient bottleneck in traditional two-two matching approach, greatly simplified the complex matching process. Meanwhile put forward a kind of selection criteria for uncertainty selection, opened up new ideas to the research of uncertain matching. This main research work and contributions of the paper can be summed up as follows:(1) This paper mainly introduces ontology and analyzes the structure of domain ontology, according to the method of constructing domain ontology and combining related Deep Web querying interface properties and real case in tourism field uses more standard, expressive ontology language OWL2 as code language, constructs query-oriented interface tourism domain ontology.(2) Based on thorough study and analysis of the traditional schema matching technology, this paper presents a querying interface schema matching method based on the Deep Web domain ontology, using this method, it is possible to realize the holistic matching in specific areas, which is much better than the traditional pairwise matching in efficiency. The method makes full use of ontology concept and the semantic relations between concepts realized the understanding of querying interface in the semantic level.(3) For the most important problem similarity calculation in schema matching, this paper proposes an improved attribute similarity calculating method. The method was applied to the Chinese query interface integration mode matching problem, considering the rules and characteristics in Chinese query interface, improved the Chinese semantic similarity calculation formula based on HowNet. Experiments evidence showed that this formula can greatly improve the calculation accuracy.(4) This paper proposes the idea that to base on the attribute location to evaluate the attribute matching credibility in relating to the uncertainty schema matching, and gives the quantitative calculation formula of matching credibility, which can to help us choose more reasonable matching results.(5) This paper realized the integration system based on the ontology-based querying interface, including noumenon management module, pretreatment module for inquires interface, similarity calculation module, schema matching generation module and query interface integration module. Assessd and proved the key technology and calculation method proposed in this paper, provides a good platform for the collection of experimental result data.Finally, through the established system platform, set the corresponding experiment, then analyzed and evaluated the experimental results, proved the attribute matching method based on ontology and the accuracy of the improved similarity calculation method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络