节点文献

基于本体和XML的异构数据集成研究

Research on Heterogeneous Data Integration Based on Ontology and XML

【作者】 张萌萌

【导师】 张永胜;

【作者基本信息】 山东师范大学 , 计算机应用技术, 2008, 硕士

【摘要】 随着Web的迅猛发展,因特网上的资源越来越丰富,已经成为一个巨大的全球化信息仓库。Web上的资源不仅包括传统的有严格数据模型的数据库,如关系数据库和面向对象的数据库,而且还包括无结构和半结构的数据,如大量的HTML文档、XML文档和文本数据。这些分布在各处的数据资源,在其设计阶段,主要是为了满足各自的业务需要而形成的,由于软硬件平台及数据模型的不同而成为了异构数据。异构数据互相之间难以集成和共享,使各数据源间的互操作变得困难,无法实现信息的共享和有效利用,从而成为“信息孤岛”。为了更好地利用网络上浩如烟海的信息,人们迫切需要集成这些地理分布、管理自治、模式异构的数据,因此异构数据集成问题吸引了众多关注。在本文中,先全面地分析了现有的数据集成方式,异构数据集成的相关理论和技术。然后指出了当前异构数据集成的主要问题是语义异构问题。在此基础上提出了一种基于本体和XML的异构数据集成系统模型,用来解决语义异构问题。设计了基于本体和XML的异构数据集成模型,并对模型中的关键模块进行探讨。本体的引入是为了解决异构数据集成中的语义异构。本文的研究主要有以下几点:(1)探讨了异构数据集成中的相关理论和技术。分析了现有的数据集成方法,指出了当前的数据集成中急需解决语义异构。(2)通过对已有的数据集成系统体系结构的研究,结合XML技术、本体技术和Web Services技术,提出了一种基于本体和XML的异构数据集成模型。对此模型中的功能模块给出了详细的描述,并对关键模块进行了测试。(3)采用XML作为中间语言,将各局部数据源数据转化为XML数据模式进行集成,从XML Schema上构建局部本体,从而屏蔽底层数据源的语法的异构性。(4)利用本体描述领域概念的优势,采用本体描述语言OWL构建全局本体和局部本体,同时定义了全局本体和局部本体的映射,局部本体和数据源的映射规则,解决数据集成中存在的语义异构问题。(5)将各个异构数据源包装器封装为Web Services,使系统具有松耦合、灵活、易扩展的良好特性,能真正实现异构数据源的无缝集成。(6)采用XQuery作为全局模式上的查询语言,容易实现对XML数据的查询。对针对全局模式(全局本体)的全局查询语句进行分解,分解为针对局部本体术语表示的子查询语句。

【Abstract】 With the rapid development of Web, there are more and more resources in Internet which becomes a large and global information warehouse. The resource in Web not only include relational database, but also include html documents and xml documents. These scattering resources had been designed to satisfy the business need, and because of the difference between the software and hardware platform it excites heterogeneous phenomenon in these data source. It is hard to share and use the heterogeneous data source. The need of solving the integration problem of these heterogeneous data source becomes more important. However, the traditional data integration doesn’t fit the demand. So, new data integration is expected to appear.The paper analyzed different kinds of data integration method, relevant theories and technologies of heterogeneous data integration, concluded that the semantic heterogeneity of data source need to resolve. Based on this, in order to resolve the semantic heterogeneity of data source, a kind of the heterogeneous data integration system framework based on Ontology and XML was proposed. The key technologies: XML, Ontology and Web Services were discussed. This paper mainly imported the concept of Ontology and XML to realize the integration of heterogeneous data.The main achievements of this paper were as follows:(1)The relevant theories and technologies of heterogeneous data integration were discussed. Analyzed the advantage of these technologies in heterogeneous data integration, concluded that the semantic heterogeneity of data source should to resolve.(2)Based on the technologies of XML, Ontology and Web Services, a kind of the heterogeneous data integration system framework based on Ontology and XML was proposed. The functions of components were designed in detail, and the key modules of the prototype system were tested.(3)The paper Used XML Schema to express the data source, and built the local ontology from XML Schema. XML technique can solve the problems of syntax heterogeneity.(4)In the paper, it defined global ontology, local ontology, the mapping between global ontology and local ontology, and the mapping between local ontology and data source. It used OWL language to express them. The integration scheme that mapping between global ontology and local ontology was put forward to solve the problems of semantic heterogeneity.(5)The method of wrapping data source to construct Web Services featured the prototype with the following good qualities, loose coupling, more flexibility and more extensibility. With this method, the seamless data integration can be exactly achieved.(6)Use XQuery language to express global query, which is easy to query XML. Then it decomposed the global query to local ontology.

【关键词】 数据集成XMLWeb Services本体
【Key words】 Data integrationXMLWeb ServiceOntology
  • 【分类号】TP311.52
  • 【被引频次】4
  • 【下载频次】357
节点文献中: 

本文链接的文献网络图示:

本文的引文网络