节点文献

面向服务的自律恢复系统体系结构及其实现技术研究

Research on Service-oriented Autonomic Recovery System Architecture and Technologies

【作者】 董玺坤

【导师】 王慧强;

【作者基本信息】 哈尔滨工程大学 , 计算机应用技术, 2011, 博士

【摘要】 随着业务逻辑越来越复杂,企业对所需的IT支持要求也越来越高,企业的跨地域性及对系统快速响应的要求使得大部分IT系统都采用分布式结构实现。为了屏蔽IT系统日益复杂的内部实现并将业务与IT分离,使得各领域人员能够专注于各自领域,研究人员提出了服务的概念。由于面向服务分布式系统的异构性和复杂性,使得人工完成其管理和维护变得越来越困难。而传统的可靠性理论和技术又难以满足实际需要,因此如何避免服务失效或使服务失效造成的影响降到最低成为提高服务可信性的关键。但是完全避免服务失效是不可能的,服务从失效中快速恢复却可以尽量降低服务失效造成的影响。人们期望有一种灵活的自动恢复方法来解决上述这些问题,自律恢复技术因此应运而生。恢复技术存在已久,自律计算也已提出一段时间,但是如何将自律恢复应用于面向服务计算的研究还比较缺乏。针对其分布式环境、高可用性要求如何构建框架模型,如何实时监测各种随机性事件以及如何使系统从失效中快速恢复都有待深入研究和探讨。基于这些问题并结合具体项目需求,本文以提高服务恢复能力为目标,对面向服务计算中的自律恢复模型、监视模型及恢复技术进行了深入研究和阐述。首先,结合服务的特点,基于自律恢复技术的特性构建具有分层结构的自律恢复模型框架。以实现具体业务的服务层为目标系统,由中枢层负责监视及分析,之后由恢复层负责实施恢复,形成自律恢复典型的环结构。为每一层进行细化分析设计,确定功能模块与工作流程,并研究可信性定义及量化、部件之间通信等关键问题,为系统实现自律恢复提供理论支持。最后应用PEPA对模型进行形式化描述与分析。其次,信息采集部分是整个自恢复系统工作的基础,所有分析与恢复动作都基于信息采集部分所得的信息。本文在对监视技术的现状进行深入分析的基础上,从文件的角度提出一种基于文件多属性的综合监视模型及相关算法并实现之。将各种事件归为文件的三种属性:静态属性、动态内部属性和动态外部属性,并基于相应算法对各属性事件实施监视。再次,针对自律恢复需求,提出一种不依赖于具体服务器、跨语言的递归微重启方法。现存的微重启依赖具体环境、要求系统组件化,恢复性能受到限制。事实上每个服务的响应时间都取决于组件内部每个可微重启元素的响应时间,为了提高恢复性能,需要将恢复粒度进一步细化,对出现异常的地方更精确的微重启。将微重启理解为微观上的重新执行,从而提出这种控制可微重启元素以实现提高系统性能的微重启方法,并实现使可执行文件具有微重启功能的自动修改。然后,作为一种切实有效的恢复方案,对软件热插拔技术进行深入研究,提出一种新的软件热插拔方法以及适用该方法的软件结构应满足的条件,并给出该方法的算法流程。不同于现有方法,该方法维护一个结构相对简单的中间接口表,使软件的结构更加清晰和易于管理。实验结果表明:本方法能用于容忍软件被攻击、程序文件被非预期修改等破坏行为并采取对应防护措施,能够增强系统的自我保护能力;只使用一个全局代理,降低了系统开销。最后,以一个web服务—EShop网上商城购物系统为实施案例,介绍本文提出的自律恢复系统模型及技术的具体实施。结合实施案例的MVC架构特点,将自律恢复系统的恢复结构与工作流程映射到实际应用,为所提自律恢复系统在其它服务上的应用提供示范和指导。

【Abstract】 As business logic becomes more and more complicated, requirement of IT supports of enterprises becomes more and more sophisticated. Most IT systems adopt distributed structure because of enterprises’cross-boundary nature and requirement of fast response. In order to separate IT and business and shield increasingly complex internal implementation of IT systems, researchers put forword the concept of service. Because of complexity and Heterogeneity of service-oriented distributed system, artificial management and maintainment become more difficult, but traditional dependability theory and techniche can’t meet actual needs well. So how to avoid service failure or minimize the impact of failure becomes the key issue of improving dependability of sevice. But it’s impossible to totally avoid service failure, however fast recovery from failure can deduce impact of service failure. So autonomic recovery techiche came into being with people’s expect for solving the problems.Because of distributed environment and high availability demand, construction of framework, real-time monitoring of random events and fast recovery from failure are to be deep studied and discussed. So this paper takes improvement of service recovery ability as target and deep studies autonomic recovery model, monitoring model and recovery technology.First, an autonomic recovery framework model which has a hierarchy structure is constructed based on features of service and autonomic recovery technology. Central layer is responsible for monitoring and analizing and recovery layer is responsible for implementing recovery while service layer is taken as target system. Then detailed design, fuction module and workflow of each layer is determined, and some key issues like communications between components and dependability definition and quantification are researched, which provides theory support for autonomic recovery implementation. At last, the model is formally descript and analyzed with PEPA.Second, as information collection part is basis of the whole autonomic recovery system, all analysis and recovery actions are based on the information from it. So the paper puts forward a comprehensive monitor model and relative algorithms based on files’multi- attributes. In this model all events are classified as three file attributes:static attribute, dynamic inner attribute and dynamic outside attribute, and each kind of events are monitored based on relative algorithm. Experimental results show that the proposed monitor model has better monitoring performance than existing monitoring mechanisms.Third, to meet needs of autonomic recovery, we propose a recursive microreboot method which is insensitive to language and doesn’t rely on specific server. Performance of existing microreboot methods is limited because of their dependence on environment and requirement of modularization. Acturally response time of each service depends on response time of each microrebootable element inner the service. So In order to improve recovery performance, recovery granularity needs to be refined to reboot more accurately. Regarding microreboot as microscopical re-execution, the paper proposes a microreboot method which can improve performance by controlling microrebootable elements. And experimental results show that the proposed microreboot method has better recovery performance and less limits than existing ones.Fourth, the paper proposes a novel software hot-swapping method and its structure and algorithm are also descript. Unlike existing ones, this method maintains an intermediate interface table which has a simple structure. Experimental results show that the method can be used to tolerate attack, unexpected modification and meet upgrade needs at the same time; it just contains one global agent so that system cost is deduced.The last, taking a web service as experimental case, the paper introduces implementation of the proposed autonomic recovery model and technology. With consideration of MVC structure of the case, recovery structure and workflow are deployed on practical application, which provides demonstration and guide for autonomic recovery system’s application for other services.

【关键词】 服务自律恢复监视微重启热插拔
【Key words】 serviceautonomic recoverymonitormicroreboothot-swap
节点文献中: 

本文链接的文献网络图示:

本文的引文网络