节点文献

数据网格QoS保障与资源优化关键技术研究

Research on QoS Guarantee and Resource Optimization Key Technologies in Data Grid

【作者】 曲明成

【导师】 杨孝宗;

【作者基本信息】 哈尔滨工业大学 , 计算机系统结构, 2011, 博士

【摘要】 网格是一种先进的信息技术基础设施,目的是有效整合Internet上广泛分布的各种计算资源、存储资源、通信资源、信息资源等,向用户提供虚拟、统一、透明的计算环境。数据网格作为网格计算领域的一个分支,已经得到学术界的极大关注。数据网格是指广域范围内,对大规模的数据集进行分布式管理、分析及使用的一个综合体系结构。数据网格实现了安全、可靠和有效的数据传输、访问、存储和副本管理等操作,并提供到不同存储系统的统一的接口,从而使得数据密集型高性能计算及科学研究成为可能。Ian foster指出网格最基本的一个特征就是“提供非凡的服务质量(QoS)”。为了保障数据网格具有较高的QoS,需要克服网络以及网格节点的诸多不稳定因素,而资源(能力)预留、副本部署、缓冲区机制、并行数据传输和数据存储与恢复是解决这类问题的主要手段和当下研究的热点问题。海量数据存储和传输导致大量网络传输、存储资源、节点资源的不必要浪费,致使高峰时刻网格服务接纳率的急剧降低,带来整体QoS的下降。目前多数研究更关注于从某方面提升服务质量,而较少的考虑资源的优化调度问题,从而在保证QoS的同时一定程度上降低了网格系统整体资源的使用率。本文从“保障QoS是基础,优化资源的使用是目标”这一宗旨出发,深入研究了如何在保障数据网格QoS的同时对使用的资源进行有效优化的问题。论文将数据网格最基本的功能(数据存储和数据传输)在服务级QoS层面分解为5个主要子服务(传输、存储、缓存、节点选择、资源预留),针对不同的子服务采用了特定手段达到了QoS保障与资源优化的双重目标。具体为:(1)多副本部署可以提升数据的可靠性和数据服务带宽,降低网络负载,基于多副本的并行传输算法可以极大提升传输速度,保障数据服务的QoS,但是多个完整副本部署对存储空间和网络传输的消耗极大。本文首先提出了一个数据的分布式存储模型,存储模型在存储空间使用上具有较大的优势(存储优化),同时具有P完整性,可以保证在任意P个节点失效时数据仍然完整;基于存储模型给出了一并行传输调度器,在双副本冗余度时调度器可以适应节点间速度的较大差异,以调度器为基础给出了一个并行传输算法,配置合理参数,算法可以达到基于多个完全副本的并行传输速度。(2)为了保障数据存储的可靠性,基于并行传输的动态数据恢复是数据网格应具备的能力。在优化使用存储空间前提下,不但要保障数据存储的基本QoS指标:可靠性和可用性,同时还必须兼顾数据的易用性。本文基于分布式存储模型,结合节点失效性、动态恢复过程和数据交换中心策略,以本文提出的调度器、并行传输算法并结合泊松分布定理,提出了一个动态的数据恢复模型。数据恢复模型较双副本存储具有更低的数据失效概率,较纠删码策略具有更强的易用性。(3)为了克服网络的不稳定性,数据缓存是网络应用较为常用的一个主要策略。考虑数据的海量特性以及资源有限性,在数据缓存服务中需要优化配置缓冲区大小,并应考虑诸多因素,包括:数据源节点的失效性、参与服务的节点集合、各节点的传输速度、任务对数据失效时间的约束以及对整体失效的要求等。本文通过引入有限缓冲区模型,从数据消耗者角度出发,以多副本存储和并行传输模式为基础,推导出一个服务失效模型,该模型有效表述了影响服务失效的各种参数间的量化关系,进行了仿真实验,将模型理论值与实验值进行了对比分析,取得了较好的结果,达到了缓冲区和服务节点的优化配置,缓存服务QoS保障的目标。(4)基于多副本进行并行数据传输的一个重要问题是:在满足服务可靠性、传输时间等QoS约束的前提下,如何能够合理选择节点资源。本文提出了两个模型,模型能够对网格节点的传输速度、可靠性、传输距离、网络状态、带宽约束等因素进行综合决策,从而给出最优的服务节点集合,使达到合理使用节点资源、降低网络负载、降低服务请求的容忍度、提升高峰时网格系统的接纳率与服务质量并保证一次服务代价最小的多重优化目标。(5)资源预留是保障数据网格任务顺利完成的基本前提,预留请求的接纳率直接影响到服务的QoS。资源能力从宏观上更好的描述了资源的数量和使用情况,对于资源的预留服务提供了强有力的支撑。合理的调配资源能力,可以降低资源能力碎片,提升高峰时刻网格系统的吞吐量和接纳率,使达到优化资源、保障QoS的双重目标。本文提出了基于并行加速比和四元法资源能力预留策略,与先前机制相比使网格系统可以根据预留请求的综合信息进行主动决策,并对预留请求进行一定的资源能力变换,进一步优化资源的使用,有效降低资源能力碎片,提升高峰时刻服务接纳率。个体服务QoS保障与资源优化可以提升网格系统整体资源的利用率,从而可以在高峰时刻提升服务请求的接纳率和个体服务的QoS。

【Abstract】 Grid is an infrastructure of advanced information technology. It aims to effectively integrate a variety of widely distributed computing resources, storage resources, communication resources, information resources, and to provide users with a virtual, unified, transparent computing environment. Data grid as a branch of grid computing has been of great concern to academics. Data grid is an integrated architecture that in wide area can effectively manage, analyse and use distributed data sets. Data grid will achieve safe, reliable and efficient data transmission, access, store and copy management operations, and provides a unified interface to different storage systems, so makes data-intensive high performance computing and scientific research be possible.Ian foster pointed that one of its basic features is "to provide exceptional quality of service (QoS)". In order to guarantee the data grid has a higher QoS, the grid system must overcome many uncertainties of network and grid nodes. Currently the technologies of resource (capacity) reservation, replica deployment, buffer strategy, parallel transmission and data storage and recovery are the primary means of solving such problems and the hot research issues. Mass data storage and transmission result in the unnecessary waste of network transmission capacity, storage and node resources, and result in the dramatic reduction of acceptance rate for grid service in peak hour and the decline in the overall QoS. At present, most researches focus more on enhancing the quality of services from some aspects, but less consider the optimal scheduling of resources, thus while the QoS is ensured, much price will be paid at the same time.The thesis based on the purpose of "the guarantee of QoS is a basis, and the optimization of resource is goal" deeply studies the problem that how to guarantee the QoS while effectively use the resource of grid. In this thesis we divide the basic functions (data storage and data transfer) of data grid into five sub service (transport, storage, cache, node selection, resource reservation) from service level of QoS, for different sub-services specific strategies are used and dual objective of QoS guarantee and resource optimization are achieved at the same time. Specifically:(1) Multi-replica deployment can be used to improve the reliability of data and service bandwith, decrease the workload of network. The algorithms based on multi-replica can be used to increase the transmission speed further, can guarantee the QoS of data service. But multi-replica causes a waste of storage space and network transmission capacity. In this thesis a distributed storage model is proposed first, the model has a large advantage of the use of storage space (memory optimization), and also has the characterristic of "P integrity", i.e., when P nodes fail, the complete data can be got from the remaining nodes. A parallel transmission scheduler is put forward based on storage model, When double redundance is used, the scheduler can adapt to big differences of transmission speed of replica nodes. Based on storage model and scheduler a parallel transmission algorithm is proposed, When reasonable parameters are configured, the algorithm can achieve the transmisison performance that the algorithms based on full-replica can get.(2) In order to guarantee the realibility of data storage, the dynamic data recovery based on parallel transmission is the basic capacity that the data grid should have. In the premise of optimizing the use of storage space, not only the basic QoS of data storage (reliability and usability) should be guaranteed, but also the availability (ease of use) must be considered. A dynamic data recovery model (DDRM) is proposed based on storage model, scheduler, parallel transmission algorithm, node failure, dynamic recovery process and data exchange center strategy. DDRM has lower data failure probability comparing with double-replica and greater availability comparing with erasure codes stragegy.(3) Data buffer is a key strategy that can be used to overcome the instability of the network. Taking into account the characteristics of mass data and limited resources, the size of buffer should be optimized in cache service, meanwhile the following factors should be considerd: the failure probability of service nodes, set of service nodes, transmission speed of service node, constraint of failure time of task and the whole service failure probability. By introducing limited buffer model, from the perspective of the data consumer, a service failure model is proposed based on parallel transmission mode. The model effectively represents the quantitative relationship of various parameters that can impact the service failure. In simulative experiment, the theoretical values of model and experimental value are compared, and good results are got, menwhile the objectives of buffer optimization, service node optimizaiton, and the guarantee of QoS for caching services, are chieved.(4) An important problem for parallel transmission based on multi-replica is: how to select replica node under the condition of meeting the QoS constraints of service reliability, transmission time etc. Two node selection models are proposed, the models consider the parameters of transmission speed of node, reliability, transmission distance, network status, bandwidth and other factors, taking those parameters as input, the model can be used to output the optimal service node set. So the multiple optimization objectives of rational use of node resources, reducing the network load, reducing the tolerance of service request, enhancing the acceptance rate in peak time and ensuring the minimum cost of one service, are achieved at the same time.(5) Resource reservation is the basic premise of ensuring the successful completion of grid task, the acceptance rate of reservation requests has direct affect on QoS. Resource capacity effectively represents the amount and status of resource from a macro point of view, provides a strong support for reservation services of resource. Reasonable allocation of resource capacity can reduce resource capacity debris, and enhance the throughput and acceptance rate in peak time, and achieve the dual goals of resource optimization and QoS guarantee. Two resource capacity reservation strategies--parallel speedup and four-tuple, are put forward, comparing with previous researches, the strategies proposed can make the grid system make active decision according the comprehensive information of reservation requests, and can make certain resource capacity transformation to reservation requests, so the use of resource capacity can be optimized further.In this thesis, the most basic functions (data storage and data transmission) of data grid are divided into five detailed services, i.e., transmission, storage, buffer, node selection and resource capacity reservation. For different services specific strategy is adopted to achieve the objectives of QoS guarantee and resource optimization.

节点文献中: