节点文献
网格环境下基于多Replica的数据管理与传输模型的研究
Research of Data Management and Transfer Model Based on Multi-Replica in Grid Environment
【作者】 郭兰图;
【作者基本信息】 中国石油大学 , 计算机应用技术, 2007, 硕士
【摘要】 数据网格以其良好的数据共享和协同工作能力,满足了诸如高能物理、气候模拟等数据密集型任务的需求。然而,由于动态复杂的网格环境中节点失效、网络突变等情况时有发生,使得网格环境中数据传输的速度和稳定性都无法得到保障,成为制约网格技术应用的“瓶颈”。Replica技术是数据网格中的关键技术,它在本地创建远程数据的副本,降低了网络延迟及带宽消耗,同时也形成了多副本并存的网格资源共享方式,这种方式为解决传输问题提供了机遇,于是开展基于多Replica的数据传输研究,成为解决网格数据传输速度和稳定性问题的重要途径。本文以提高网格环境中数据传输速度和稳定性为目标,采用Globus Toolkit中间件,开展将Replica技术融入数据传输的研究,主要工作体现在:(1)分析了网格数据管理及其Replica技术:总结了网格数据管理、Replica技术,并对论文所涉及的Replica定位和选择算法进行了分析;(2)研究了网格数据传输机制:从资源共享方式和传输协议两方面对比分析了不同资源共享方式、不同传输协议等对网格数据传输的影响;(3)实验分析了GridFTP协议的传输性能:对GridFTP并行传输、条状传输等进行了实验,通过性能分析,进一步证明了课题研究的重要意义;(4)提出了基于多Replica的数据传输模型MRT及其算法:提出了MRT模型,并定义了模型的组成元素及其间的映射关系;设计了模型的区域化多层次副本定位策略;并借鉴概率预测方法,在启发式算法的基础上设计了启发式动态任务分配算法,最后对策略和算法进行了复杂度分析;(5)设计和实现了模型的测试系统:从整体和模块两个方面对系统进行了设计和实现,并基于测试系统对模型的性能进行了实验。理论分析和实验结果表明,MRT模型有效地提高了数据传输的速度和稳定性,特别是在传输大文件时效果比较明显。
【Abstract】 Data grid meets the demand for data-intensive tasks with good data sharing and collaboration capabilities, such as high-energy physics, climate modeling and so on. However, because of the dynamic and complex grid environment, node failures and unexpected changes in network occur frequently. So the speed and stability of grid data transfer can’t be guaranteed, and it has become the“bottleneck”that restricts grid applications.Replica is the key technology of data grid. It creates local copies of the remote data, reduces network delay and bandwidth consumption, and simultaneously forms a way of multi-replica coexisting gird resources sharing which provides opportunity to resolve transfer problems. So the research of data transfer based on multi-replica becomes an important approach to resolve the problem of speed and stability of data transfer in grid environment.The purpose of this paper was to increase the data transfer speed and stability in grid environment. The Globus Toolkit middleware was used and the research focused on the combination of Replica technology with data transfer. The main works were that:(1) Grid data management and its replica technology were analyzed: this paper summarized gird data management and its replica technology, as well as the involved replica location and selection algorithms;(2) Research of data transfer mechanism in grid: the analysis of the influence to grid data transfer by different resource sharing ways or different transfer protocols was made;(3) Transfer performance analysis of GridFTP protocol was made by experiments: we did experiments about GridFTP parallel transfer and strip transfer. And through the analysis, the importance of this paper’s further research was much clearer;(4) A data transfer model based on multi-replica (MRT) and its algorithms were proposed: this paper proposed a data transfer model based on multi-replica, named MRT, and defined the model’s elements and mappings between them; Then, the model’s multi-level replica location strategy based on locality was designed; Besides these, a heuristic dynamic task allocation algorithm was designed based on heuristic method and probability forecast method. Finally, we made the analysis of complexity of the strategy and algorithm;(5) Design and implementation of MRT model’s test system: this paper designed and implemented MRT model’s test system from two aspects: the whole and modules. And the experiments testing model’s performance were done based on the testing system.Theoretical analysis and experimental results showed that the MRT model had effectively improved the speed and stability of data transfer, especially for bulk data transfer.
【Key words】 Grid; Data Transfer; Replica; Transfer Protocol; Task Allocation;
- 【网络出版投稿人】 中国石油大学 【网络出版年期】2008年 03期
- 【分类号】TP311.52
- 【被引频次】3
- 【下载频次】116