节点文献

基于P2P框架的数据库网格中若干关键技术的研究

Study on Some Key Techniques for Database Grid Based on P2P Framework

【作者】 王广奇

【导师】 于戈;

【作者基本信息】 东北大学 , 计算机软件与理论, 2008, 博士

【摘要】 随着信息技术的发展,各行业的信息量呈爆炸性增长,其中包括众多公共有效的数据库资源。地理上广泛分布的用户都希望能够按需透明地访问和使用这些丰富的数据资源。如高能物理、生物计算等科学研究领域、电子商务领域和深层Web数据查询领域等。数据网格是基于广域网对海量、分布异构的数据资源进行管理、访问和共享的系统。数据库网格概念是随着公有数据库资源的丰富而提出的,是以数据库为主要资源的一种数据网格系统,可为上述应用提供良好的支持。一方面,利用网格环境的高效处理能力实现海量数据的有效整合,有效地利用已有众多的数据库资源;同时也可以利用数据库管理系统高效的数据管理能力,为网格内实现数据库资源的有效管理、分布数据的优化集成以及海量数据的分析处理等提供强有力的支持。为此,本文针对目前支持数据库网格的若干关键技术进行研究,主要包括基于P2P模型的网格体系结构、P2P支持框架、资源管理、数据动态集成、资源副本管理、数据概要等进行深入研究。目前有关数据库网格的研究和开发还处于起步阶段。几年来,最具有代表性的工作是隶属于全球网格论坛(GGF:Global Grid Forum)的DAIS工作组(Database Access and Integration Services Working Group)制定的网格环境下访问数据库的协议和中间件,以及针对特定应用数据库进行处理的网格系统。已有工作大多是针对特定领域,基于静态环境构建的数据库网格系统,很少有如何适应网格环境内数据库资源的不确定性的相关讨论,也没有看到有关完善的支持动态数据集成的数据库网格的报道。虽然网格环境下针对数据库的处理技术同已有的多数据库、并行数据库以及分布式数据库的处理技术有很多相容的方面,但由于网格环境内数据资源的不确定性(动态性),已有支持技术不足以支持构建具有不确定性的数据库网格环境。这些不确定性给数据库网格中数据资源管理、资源查询处理、数据集成、事务调度、海量数据分析等带来了新的挑战。本文基于OGSA-DAI(Open Grid Service Architecture-Database Access and Integration)规范,采用面向服务的思想,基于P2P模型,研究构建一个面向多领域的、支持动态数据集成的数据库网格系统的若干关键技术。目的是在网格环境下,借用网格的高效处理能力,为分布、自治、异构的数据库资源的有效管理、动态数据集成和分析处理等提供一个良好的使能环境,透明地为用户按需提供服务。在支持数据库网格的框架结构方面,针对集中的资源管理方法具有一定的局限性,呈现出了Peer to Peer(P2P)体系结构。P2P方式,可避免单点失败,并具有可扩展性好等特点。为支持多领域的资源管理,在Chord结构基础上,提出了一种适用于数据库网格的P2P框架结构-MultiChord,可有效支持多领域的资源管理。在模式集成和查询处理方面,针对数据库网格内资源的自治、异构特性,定义全局模式本体,并提出了一种基于本体的查询处理策略,提高查询效率,实现异构查询。将查询处理分为全局查询和局部查询处理两部分。在全局查询处理阶段,基于本体实现语义转换、查询重写和生成查询执行计划。由局部查询处理解决数据层冲突和查询的语义扩展。在执行优化方面,为有效地减少数据传输代价,提出了一种基于关键字过滤的执行优化策略,并基于“能者多劳”的思想,动态协调数据资源的传输量,有效地提高了数据网格内的数据资源的获取效率。在副本管理方面,针对P2P系统中节点可随时加入和退出的动态特性,为保证基于P2P模型的数据网格系统的健壮性,提出了一种基于多根节点多点维护的副本管理策略,可有效地保证数据网格系统的健壮性,提高了资源的获取效率。在数据概要方面,为有效地从海量数据的查询结果信息中获取最多的知识,提出了一种基于聚类分析的分布式的数据概要策略。数据集成层保证合成的数据满足用户的模式需求,聚类分析层将概要数据展示给用户,以便于直观的数据分析。最后,在提出的关键技术基础上,设计并实现了一个支持多领域的、面向数据库资源动态集成的数据库网格系统DS-Grid。基于DAI将数据资源包装为Grid Services,基于JAXT构建MultiChord框架,采用XML数据库管理系统Shunsaku为XML数据资源仓,主要编程语言为Java。该系统是国家863项目子项目“面向CIMS领域的支持企业业务柔性集成的服务网格技术的研究“的一个子系统,经过测试评价,具有较高的效率,很好地达到了预期目标。

【Abstract】 With the development of information technologies, the amount of information is explosively increasing, including common available database resources. Geographically distributed users all want to access those data sources, such as those in academic fields in high energy physics and biologic computation, in electronic commerce field, and in deep web data query applications etc. Data grid is a WAN based system for managing, accessing and sharing the resources. Database grid is a type of data grid system mainly for the databases, which supports them. First, it makes use of the high processing ability to implement the effective integration of large amount of data, utilizing the common database resources. Meanwhile, it makes use of the data management ability and distributed data integration for further analysis and processing. Thus, the disertation is toward to study some related key techniqiues, including grid architecture based on P2P model, P2P supporting framework, resource management, data integration, data replica management.At present, the research and development on database grid is just at beginning. In recent years, the representative works are protocols and middlewares for accessing database in grid environments, and grid systems for specific application database processing, developed by Database Access and Integration Services Working Group of GGF (Global Grid Forum). Existing works mainly are oriented at specific domain and based on static environments. There are few discussions about how to adapting the uncertainty of database resources. Also there are few reports about the database grids for better dynamic data integration. Although database processing technologies in grid environments are quite similar to those of multi-database systems, parallel database systems and distributed database systems, the existing work is insufficient for supporting the database grid environments with uncertainities. Those uncertainties in grid environments pose challenges for data resource management, resource query processing, data integration, transaction scheduling, massive data analyisis in database grids.The dissertation studies key techniques of database grids for multiple domain and dynamic data integration based on OGSA-DAI specification, service oriented ideas and P2P model. It aims at provides an enabling environment for distributed, heterogeneous database resource management and data source integration, to provide transparent services for the users, by making use of the high processing ability of computer grids.On the aspect of framework to support database grids, to overcome the limitations of centralized resources management, the P2P model is widely used, since P2P can avoid single peer failure, and has good extensibility. To support resource management of multiple domains, based on the Chord structure, a P2P framework--MultiChord for database grid is proposed, which can provide efficient resource management on multiple domains. In schema integration and query processing, to resolve heterogeneity and automocity of the resources in a database grid, global schema ontology is defined and an ontology based query processing strategy is proposed. It divides query processing into two parts, global and local query processing, where the former implements semantic transformation, query rewriting and execution based on ontology, while the latter deals with the data layer conflict resolution and query extension.In execution optimization, to decrease the data transmission cost effectively, a keyword based execution optimization method is proposed, which is based on the idea of "Able men are always busy" to dynamically scheduleing transmission amounts of data resource. It can effectively improve the efficiency of searching data in a database grid.In replica management, to resolve the dynamics of joining or leaving at any time in a P2P system, a multiple-root maintenance based replica management strategy is proposed. It can effectively ensure the robustness of the system and improves the efficiency of resource acquirement.In data summary, to effectively obtain the knowledge from the massive amount of information, a clustering based data profile method is proposed. The data integration level ensures the integerated data to satisfy the schema requirements of users, and the clustering analysis level presents the profile data to the users for intuitive data analysis.Last, the dissertation designs and implements a data grid system DS-Grid for dynamic database integration and multiple domain applications, based on the proposed key techniques. It wraps data resources based on DAI as Grid services, constructs MultiChord P2P model based on JXTA, adopts the XML DBMS Shunsaku for XML repository, and is coded in Java. The system is a sub-system of a National 863 Program project "CIMS oriented service grid for flexible integration of enterprise business". After the testing and evaluation, the system has got high performance and reached the expected aim.

  • 【网络出版投稿人】 东北大学
  • 【网络出版年期】2011年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络