节点文献

数据网格副本管理关键技术研究

The Research on Key Technologies of Replica Management for Data Grid

【作者】 孙海燕

【导师】 邹鹏;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2005, 博士

【摘要】 信息技术的发展使政府、企业、教育科研、医疗卫生等各部门纷纷在Internet提供的信息服务平台上开展业务。由于Internet缺乏对数据共享和协同问题求解的有效支持,使上述应用领域中的很多信息系统存在资源部门割据、信息分散、数据难于查找、应用系统相互孤立、难以互联互通等问题。 数据网格(Data Grid)面向广域网异构环境,为用户建立分布、异构、海量数据的一体化访问、存储、传输、管理与服务架构,是实现广域网环境下数据共享和协同问题求解的有效途径。数据网格技术可以为政府、企业、教育科研、医疗卫生等领域中的大量数据提供有效管理和共享的途径。 面向上述应用领域的数据网格系统通常表现出如下特点:系统具有P2P结构;网络通信能力有限;各节点的存储能力有限等,我们将这种网络环境称之为“低端计算环境”。在数据网格系统中,复制技术被广泛采用以提高系统性能,缩短数据响应时间并降低网络带宽消耗,复制技术的引入带来了副本管理问题,而副本管理的效率直接影响了系统的性能。目前,副本管理问题是业界广泛关注的热点问题。为了提高运行在“低端计算环境”上的数据网格系统的性能,本文致力于研究适应低端计算环境的数据网格副本管理关键技术。 本文以系统性能、可用性和可扩展性为目标,针对低端数据网格系统的特点,开展低端数据网格系统中副本管理技术的研究。本文的主要贡献在于: 1.提出了NLPR数据网格复制模型,建立了数据网格系统中副本管理各类问题的统一描述,简化了副本管理问题的求解过程;提出了基于服务合成的副本管理服务框架SCRMSA,为面向不同应用领域的数据网格副本管理服务提供了一种开放、透明、灵活的实现方案;在此基础上,提出了数据网格系统管理模型DGRMSM,实现对数据网格副本的透明、可扩展、开放的管理。 2.提出了“存储联盟”的概念,并以此为基础提出了基于存储联盟的双层动态副本创建策略SADDRES,建立了存储联盟间利用缓存建立数据副本、存储联盟内根据用户访问历史合理分布数据的副本创建策略。 3.基于“存储联盟环”的结构,提出了层次式副本定位与选择机制SAHRLSM,主要包括副本目录SARRC、副本定位算法SAHRLM和副本选择算法SAHRSM,并实现了网格节点和存储联盟的动态性管理、副本目录的一致性维护等机制,具有负载均衡、可靠性高以及可扩展性好等优点。 4.为了实现对网格副本的一致性管理并屏蔽不同网格应用的差异,引入了副本一致性管理服务RCMS,为数据网格系统提供了分布、高效、灵活的副本一致性管理机制;基于低端数据网格系统的特点,提出了基于视图和版本的副本一致性管理策略VVRCOMS,该策略采用版本机制对副本进行管理,在保证用户视图一致性的前提下,具有较好的系统性能。

【Abstract】 With the promotion of information technology, many business systems have been developed on the Internet in domains as diverse as government, business, and education. The combination of large dataset size, geographic distribution of user and resource, cooperative heterogeneous resource access and resource sharing on wide-area network results in stringent demands that are not satisfied by any traditional data management technology.Data Grid provides a mechanism for effective distributed resources sharing and transparent remote access to heterogeneous data on wide-area network. It can be applied to manage and share the large datasets effectively in domains such as government, business, and education.The Data Grid systems applied in domains mentioned above have the following characteristics: P2P topology, limited communication capability and limited storage capability. We call the computing environment with those characteristics "Low-end Computing Environment" and the Data Grid systems running on the Low-end Computing Environment "Low-end Data Grid". In distributed systems, data replication is a well-known and widely accepted technique to reduce data response time and network bandwidth consuming. However, the replication in Data Grid brings a series of replica management problems that affect the performance of Data Grid System greatly and replica management becomes a hot topic in Data Grid. The performance requirement of Low-end Data Grid is driving force for this thesis.The main contributions of the thesis in theoretical, technical and practical aspects are as follows:1. To give a uniform description of replication technology in Data Grid, the thesis presents a replication model NLPR on which replica management related mechanisms could be clearly described. A replica management service implementation skeleton-SCRMSA is proposed to provide a solution to open, transparent and flexible replica management services in different fields. Based on SCRMSA, the thesis puts forward a transparent, scalable and flexible data grid replica management system model DGRMSM.2. The thesis gives a new concept-Storage Alliance, based on which a dynamic replica creation strategy SADDRES is presented. SADDRES creates replicas by cache among Storage Alliances, and optimize the data distribution of a Storage Alliance according to users’ access history.3. Replica location and selection mechanism is the key issue of datamanagement and access in Data Grid. To provide a load-balanced, high-performance, and high-scalable replica management, the thesis explores a hierarchical replica location and selection mechanism SAHRLSM. SAHRLSM includes SARRC replica catalog, SAHRLM replica location mechanism and SAHRSM replica selection mechanism. It also provides a series of methods to manage the dynamics of nodes, the dynamics of Storage Alliances and the consistency of SARRC replica catalog.4. It is a tough task to achieve the consistency of the replicas in deferent data grid applications. In the thesis, a replica consistency management service RCMS is introduced to provide distributed, high-efficient and flexible replica consistency management. Considering the characteristics of Low-end Data Grid, the thesis proposes a view and version based replica consistency management strategy VVRCOMS. VVRCOMS adopts version mechanism to manage the replicas and achieves high system performance by assuring the view consistency of users.5. In order to evaluate the mechanisms above, the Data Grid simulation tool-OptorSim is adopted. By extending OptorSim, we implement the replica management system StarRMS, which includes all mechanisms we proposed. The experimental results show that our replica management mechanisms are correct and effective. By adjusting environment variables of OptorSim, we test the performance of StarRMS in different computing environments. The results will be a valuable reference to the application of StarRMS in future.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络