节点文献

基于软件体系结构的容错机制动态配置技术研究

Research on Software Architecture-Based Dynamic Reconfiguration of Fault Tolerance Mechanisms

【作者】 李军国

【导师】 梅宏;

【作者基本信息】 北京大学 , 计算机软件与理论, 2009, 博士

【摘要】 软件实现的容错技术是保障软件可用性和可靠性的主要方法之一,它在运行时刻检测功能构件中的错误,并将错误状态恢复正常,从而避免整个软件系统因一个构件发生故障而不能向用户提供正确的服务。容错技术的一个固有特性是特定于具体的软件系统:容错技术必须要与软件系统的故障假设、应用领域、运行环境和系统特征等因素相匹配。这种特定性限制了容错技术对环境和需求的应变能力。作为Internet上的一种新型软件形态,网构软件在开放、动态、多变的环境中运行,并以包括第三方构件在内的异构构件作为其构成单元,这种运行环境和构成单元上的“开放性”导致网构软件的运行时行为体现出一定程度的“变化性”。当在网构软件中使用容错技术时,网构软件的变化性与容错技术有限的应变能力之间的矛盾变得突出。为了能够在因外部运行环境发生变动或内部构件更新升级而造成软件的故障假设、容错需求或特定应用的约束发生改变的情况下,继续保持网构软件的高可用性和高可靠性,一种可行途径是在运行时刻根据需要为不具备容错能力的构件增加容错能力,或者调整其已有容错能力(去除或替换为另一种容错技术)。本文将这种调整称为对容错机制的动态配置,并基于这一思路,对网构软件的容错机制动态配置需要解决的两个关键问题进行了研究:(1)如何清晰区分软件的功能部分与容错部分,并刻画出二者之间的关系,从而使动态配置容错机制成为可能;(2)如何确保动态配置结果的正确性和有效性。为了解决上述问题,本文建立了一套基于软件体系结构的容错机制动态配置技术框架,其主要特色和贡献包括:(1)从软件体系结构的角度解释容错机制动态配置问题,并将每一种适用于网构软件的容错机制规约为一种支持容错的体系结构风格(即容错风格)。容错风格明确了容错机制的结构、行为以及对应用构件的影响,并作为动态配置过程中的核心知识。同时,给出了容错体系结构的形式化模型,以支持对动态配置结果的验证。(2)提出一种基于模型检查的容错风格选择方法,解决了为网构软件选择适宜容错风格的问题。该方法的基本思想是把容错风格抽象为一种模型检查中的计算模型,把容错需求和特定于应用的约束抽象为容错属性,通过自动检查每一种容错风格对应的计算模型是否满足给定属性,从而找到满足给定容错需求且不违反特定于应用的约束的容错风格。(3)提出一种基于模型合并技术的容错配置自动生成方法。该方法根据应用构件之间的依赖关系,确定出受容错机制动态配置影响的构件集合;通过比对容错风格实例元素与应用体系结构元素得到二者之间的匹配关系;根据匹配关系使用模型转换技术实现容错风格实例与应用体系结构的自动化合并,生成容错配置。这种容错风格实例和应用体系结构的自动化合并方法有助于保证使用容错机制时的正确性,而且合成结果可以直接用于有效性验证。(4)设计并实现了一个支持容错机制动态配置的中间件支撑框架。在这个支撑框架中,构件容器被扩展成为一个容错“沙盒”,并作为容错管理的基本单元。对应于不同容错机制的截取器组合在运行时被加入到容错沙盒中,并在容错管理服务的控制下进行动态配置和容错处理。通过使用运行时软件体系结构,体系结构规划阶段生成的容错配置可以用来指导中间件层的动态调整。框架对应用屏蔽了容错细节,实现透明的容错支持和透明的容错机制动态配置,并很好地适用于目前的主流中间件。JEE应用ECperf做为研究实例贯穿了整篇论文,并通过这个实例展示了相关主要方法的有效性。

【Abstract】 Software-implement fault tolerance (FT) is an effective way to achieve high availability and reliability. It takes two successive steps to tolerate faults in software: the error detection step aims to identify the presence of an error, while the recovery step aims to transfer abnormal states into normal ones. The effectiveness of a fault-tolerant mechanism depends on its fitness for an application context, including fault assumption, application domain, execution environment, etc. This constraint makes fault-tolerant mechanisms inflexible to the change of environment and user requirements. As a kind of Internet-scale software, Internetware runs in an open and dynamic environment, and consists of various third-party components. This openness leads to a fact that its behaviors may change continuously. When applying fault-tolerant mechanisms to Internetware, it is very likely that a formerly effective mechanism no longer works after the fault assumption, fault-tolerant requirements, or application-specific constraints are changed due to the change of execution environment or components upgrade.Reconfiguring fault-tolerant mechanisms for Internentware at runtime is a promising way to achieve high availability and reliability all the time. The reconfiguration includes adding a fault-tolerant mechanism to a non-fault-tolerant component, eliminating existing mechanism from a component, or switching between different mechanisms for a component. This thesis focuses on the above problem of dynamic reconfiguration of fault-tolerant mechanisms for Internetware.There are two major challenges in solving the problem. The first challenge is to make a clear separation between the software’s functional parts and the fault-tolerant parts, and the relationships between these two parts have to be explicitly specified. Otherwise, it is hardly to modify the fault-tolerant parts without do harm to the functional parts. The second challenge is to ensure the correctness and the effectiveness of the dynamic reconfiguration of fault-tolerant mechanisms. In the thesis, we present a Software Architecture (SA)-based approach to achieve the goal.At first, in order to depict the relationship between fault-tolerant mechanisms and application components, we specify the fault-tolerant mechanisms suitable for Internetware as a special architectural style - fault-tolerant styles, which explicitly grasp the mechanisms’structural characters, behavioral characters, and the interactions with application components. The available fault-tolerant styles are also classified and well-organized for the sake of reuse. In addition, a formal model for fault-tolerant SA is given to enable the validation of the reconfiguration. The formal model covers the fault-tolerant styles and dependencies among components, and forms the theoretical foundation of the dynamic reconfiguration of fault-tolerant mechanisms.Second, in order to select the most suitable one from several fault-tolerant styles, we use model checking to obtain solid evidences. As a pre-process step, fault-tolerant styles’behavioral models are automatically translated into a model checker’s verification model, and the fault-tolerant requirements and application-specific constraints are specified as fault-tolerant properties. Then the satisfactions of the required properties for candidate styles are verified by model checking. The satisfied properties and constraints are evidences for the selection.Third, in order to avoid the human mistakes in dynamic reconfiguration and alleviate maintainers’burden, we provide an automatic generation of the reconfiguration operations in middleware. As the first step, the scope of to-be-modified components in an application is automatically identified, with the help of dependency information provided by SA. Then the elements in a fault-tolerant style instance and those in the application are matched via the comparison of the style instance and application’s architecture. At last, model transformation technique automatically merges the style instance and the application architecture, as well as generates a desired fault-tolerant configuration. The merged fault-tolerant SA can be verified for its effectiveness and correctness. At last, we present a framework supporting dynamic reconfigurable FT in middleware. It consists of a fault-tolerant sandbox design and an FT management service. The fault-tolerant sandbox is extended from generic component containers, and it acts as a unit of the reconfiguration. Different fault-tolerant styles are implemented as different combination of container interceptors in the sandboxes. The sandbox supports dynamic loading/unloading of the interceptors to implement the dynamic reconfiguration of fault-tolerant mechanisms. The FT management service acts as the coordinator of dynamic reconfiguration and recovery. The fault-tolerant configuration generated in the above SA-level planning is mapped to a sequence of middleware operations, with the help of Runtime Software Architecture, and executed by the framework. The framework provides transparent FT and dynamic reconfigurable FT for applications, and works well in PKUAS and JBoss.A JEE application, ECperf, is illustrated as a case study in the thesis. It shows the effectiveness of the proposed approach.

  • 【网络出版投稿人】 北京大学
  • 【网络出版年期】2010年 07期
  • 【分类号】TP311.52
  • 【被引频次】6
  • 【下载频次】755
节点文献中: