节点文献

服务网格动态维护研究

Research on Dynamic Maintenance for Service-oriented Grid

【作者】 齐力

【导师】 金海;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2008, 博士

【摘要】 随着网格技术的迅猛发展,越来越多的业务需要提供高质量并且不间断的服务。然而,在系统的运行过程中,系统的重配置及资源的维护与更新是必不可少的。对于传统的服务来说,系统维护的过程不可避免地引起服务质量的下降甚至是服务的中断。而分布式系统环境之下,系统的异构性与网络延时等问题将使得这样的维护与更新更加难以实现高效的协调。另一方面,不正确的维护甚至有可能导致整个系统的不可用性。因此,如何在一个已经正常运行的分布式系统乃至网格系统中实现动态的管理维护,以最大限度地满足用户对于服务的需求,这是本研究的重点。其中,如何通过改进现有的基础架构,提高架构自身的高可用性;再从系统结构的角度,保证关键应用能够尽快地完成对应的服务是研究的关键点。首先,从基础架构的角度介绍高可用动态部署基础架构HAND的设计思想。论述从基础架构的角度,提出实现高可用服务和维护的6大准则。并提出服务级别和容器级别的部署方法,分析了两种方法对动态网格下可用性和正确率的影响。然后从HAND的实现步骤、规模测试、正确性测试等方面进行充分的实验,证明了HAND在动态变化的网格服务容器中所能带来的高可用性。然后,重点上升到体系结构的层面,讨论当网格的规模、资源和软件组件之间的相互依赖关系变得十分复杂时动态维护系统如何保证系统的扩展性和易用性。基于此,提出了一种名为Cobweb Guardian的体系结构,它由三层不同粒度的执行单元构成,并提供三种分组维护算法来避免或降低由于部署依赖、调用依赖和环境依赖所带来的负面影响。以中国教育科研网格公共支撑平台为实验对象,对相互依赖的系统服务进行了测试。测试结果显示,所提出的考虑依赖的维护机制给网格系统带来更高的吞吐率和运行时可用性。再次,结合所提出的基础架构与体系结构,重点阐述一种分布式的异步维护策略。目的是使用这种策略来降低网格异构性和紧急错误事件带给动态维护系统的影响。介入一种三层结构异步维护模型,并通过分析解释维护中时间序列的关系,证明所设计的核心算法能够保证原有应用的逻辑和维护自身的逻辑不会受到影响,而且维护的效率和可用性能够得到很大程度的提高。异步策略在中国教育科研网格公共支撑平台上得以实现,并用一个实际部署的图像处理应用做例子部署到了三个异构集群上去。实验展现了这种策略能够提高异构系统的有效性,证明系统能够处理同构和异构环境下足够多的资源变化与维护复杂度变化。最后,从应用动态维护技术的角度出发,探讨动态维护技术与网格计算中的两个主流趋势(虚拟空间服务和互操作)如何进行有效结合,来缓解由于大量虚拟组织用户访问网格基础架构时可能产生的资源匮乏矛盾。基于中国教育科研网格与英特尔公司合作的网格编程环境的平台,同时选用美国阿贡国家实验室的网格中间件Globus Toolkit 4.0与德国D-Grid的网格中间件Unicore 6.0为参照平台,利用整型规划技术对动态维护、虚拟空间服务与互操作技术进行了优化组合,使得用户请求能够在单一网格中间件无法提供服务的情况下能够继续服务。实验在图像处理应用GridBean上进行。根据采集的实际数据进行模拟的结果表明,所提出的协调模型能够有效的解决单一网格资源匮乏的情况,并能保证作业执行的高效。

【Abstract】 With the rapid development of grid technology, more and more businesses need toprovide high-quality and nonstop services. However, the recon?guration, resource mainte-nance, and upgrading operations are necessary for those critical businesses. In terms of tra-ditional maintenance technologies, these operations will inevitably cause the decline in thequality of services and service interruptions. When meeting this case in the distributed envi-ronment, a bunch of problems ( including the system heterogeneity, network latency, and soon) make such maintenances and upgrading work dif?cult so as to achieve ef?cient coordi-nation. On the other hand, incorrect maintenance work could lead even to the unavailabilityof the global system. Therefore, how to implement a dynamic maintenance mechanism ina running distributed system or even grid system, which is to maximize the utilization ofgrid resources and demands from users, is the motivation of this paper. Another focus of thepaper is to ?nd out the solutions how to improve the availability of existing grid infrastruc-tures. In addition, from the view of the system architecture, ensuring the accomplishmentof critical applications as soon as possible is also pursed in proposed dynamic maintenancetechnology.First, a highly available dynamic deployment infrastructure (HAND) is proposed in theinfrastructure layer. Six criteria are concluded from the practical experience. Accordingto these criteria, two deployment approaches including Service-level and Container-levelare introduced. The analysis on the correctness and availability for the two approaches ispresented. From the evaluation results of micro benchmark, services scale, and correctness,it proved that HAND can guarantee high availability in dynamic grid infrastructure.Second, the focus moves to the architecture layer. The paper concentrates on how toguarantee the scalability and usability during the scale of physical nodes and the dependen-cies among the grid services are growing to huge. A new architecture named as CobwebGuardian is proposed to resolve this problem. It consists of three executing units in dif-ferent granularity. In addition, it provides three group maintaining algorithms to eliminateor reduce the negative affects from deployment, invocation, and environment dependencies. The interdependent system services in ChinaGrid Support Platform are adopted to evaluatethe ef?ciency. The results proved that the proposed maintenance architecture can bring highthrough puts and availability in runtime.Third, a distributed asynchronous maintaining strategy is proposed upon the proposedinfrastructure and architecture. The motivation is to reduce the affects from grid hetero-geneity and emergent faults during dynamic maintenances. By introducing a three-tierasynchronous model and analyzing the time sequence during the maintenance, the proposedkernel algorithms guarantee that the business logics of upper applications and maintenanceswill not disturb each other. In addition, the ef?ciency and availability are improved much.The asynchronous strategy is implemented in ChinaGrid support platform and a practicalimage processing application is successfully deployed into three heterogenous clusters. Theevaluation results demonstrated that the proposed strategy can improve the ef?ciency andtolerant enough changes of resources and maintaining complexity in homogeneous and het-erogeneous environments.Finally, from the view of application layer, the paper discusses how to ef?cientlycombine the dynamic maintenance with two important technologies (including virtualworkspace service and interoperations) to resolve con?icts from increasing massiveVO users and resourceless problem in grid. Upon ChinaGrid and Grid ProgrammingEnvironment for CGSP(cooperated with Intel corp.), the Globus Toolkit 4.0 developed byU.S. ANL and Unicore 6.0 developed by German D-Grid are adopted as reference. Byusing the integer programming technology, the dynamic maintenance, virtual workspaceservice, and interoperation technologies are orchestrated effectively and ef?ciently. Bythis way, users can ?nish the job requests even when the resources are inadequate in localgrid. The image processing GridBeans are used for evaluations. The results proved that theproposed orchestration model can ef?ciently resolve the problem of resourceless in gridsand guarantee the ef?ciency of jobs execution.

节点文献中: