节点文献

面向分布式关键任务系统的自律可信性模型及其分析

Modeling and Analyzing of Autonomic Dependability for Distributed Mission-Critical Systems

【作者】 吕宏武

【导师】 王慧强;

【作者基本信息】 哈尔滨工程大学 , 计算机应用技术, 2011, 博士

【摘要】 分布式关键任务系统对可信性有较高要求,然而随着网络环境的日益恶化,蓄意攻击、系统随机故障和偶发事故等可信性威胁(Threats to Dependability, TtD)不可避免地发生,由此导致偏离使命、中断运行、软件失效、崩溃死机,甚至造成失泄密或人身财产危害,使得分布式关键任务系统面临的可信性威胁日益严峻。与此同时,由于并行计算、泛在计算和移动计算技术的推动,系统规模持续扩大、结构愈发复杂、目标日益多样性,系统自身的管理复杂性已成为一项巨大的挑战,然而传统可信性增强方法在获得可信性提升的同时往往极大地加剧了系统复杂性,已不再适应当前的发展,亟需一种新的细粒度可信性方法来解决分布式关键任务系统面临的问题。在此条件下,自律计算技术由于其“以技术管理技术”的特性,具备自主管理的能力,并在“自恢复”、“自保护”等属性中蕴含可信特性,被认为可信性实现的一种新途径,因此自律可信性应运而生。但是当前对自律可信性的研究尚处于初期阶段,自律可信性实现大多采用基于规则的模式进行,对于自律可信性模型尤其是形式化模型缺少研究,无法从全局上分析各种影响因素,指导自律可信性后续研究和发展。针对目前自律可信性模型面临的问题,本文首先提出一种分布式关键任务系统自律可信性形式化模型,分级应对各种可信性威胁,为后续研究提供理论基础。在此基础上对自律可信性的三项核心属性“自省”、“自恢复”和“自毁”分别进行研究,分析影响分布式关键任务系统自律可信性的关键因素,为自律可信性模型的改进和完善提供参考。主要研究内容组织如下:首先,提出一种基于SM-PEPA (Semi-Markov Performance Evaluation Process Algebra)的分布式关键任务系统自律可信性形式化模型。通过对自律可信性概念及其核心属性的研究,建立分布式关键任务系统自律可信性模型,在自省机制控制下综合采用“自容忍”、“自恢复”、“自毁”等手段分级应对不同程度的可信性威胁。在此基础上,采用SM-PEPA形式化语言对模型进行描述,允许动作变迁速率服从一般分布以提高模型的普适性,并从稳态概率角度提出了一种自律可信性的量化方法,尝试利用自律可信指数分析各属性对于自律可信性的影响。自律可信性模型是后续章节开展的基础。其次,提出一种分布式关键任务系统分层自省方法。通过借鉴自律计算领域现有研究成果,对分布式关键任务系统系统组件进行自律化处理。在此基础上,综合考虑内外部环境感知提出一种分层自省框架,利用局部自省降低自省时延和开销,利用全局自省保证系统层面可信目标的实现。进而针对现有自省模型缺乏理论推导基础,提出了一种基于π演算的自省描述方法,从更高地抽象层次检验自省机制,减少自省框架的漏洞,满足系统对可信性的需求。自省机制为后续自恢复和自毁研究提供了自管理基础。再次,提出一种基于PEPA (Performance Evaluation Process Algebra)流近似的自恢复分析方法。针对分布式关键任务系统自恢复过程中的分析复杂性问题,在对自恢复需求和代表性恢复结构研究的基础上,提出一种基于PEPA流近似的分布式关键任务系统自恢复分析方法,综合考虑了自律控制过程和组件的加入与退出。通过把PEPA转化为ODEs (Ordinary Deferential Equations),避免了传统markov过程分析的状态空间爆炸问题。实验结果显示,与传统的基于状态的自恢复分析方法相比,本方法可把模型求解时间控制在线性时间内,在组件数量较多时具有良好的效果。最后,提出一种基于细胞凋亡启发的自毁方法。借鉴生物细胞凋亡过程,以自省结构为依托,建立具有主被动结合方式的自毁结构,实现分布式关键任务系统的软件自毁。在此基础上,采用MRSPN (Markov Regeneration Stochastic Petri Net)对自毁过程进行描述,以提供一种量化分析的手段,提出的自毁方法具有平台和语言无关性,可以被广泛的用作自律可信性保障的终极手段。通过对模型各关键参数的分析,实验结果显示,减小心跳失效和自毁执行时间,可以增加系统的自律可信性。

【Abstract】 The special application of distributed mission-critical system raises higher demands for dependability. However, as the increasing deterioration of the network environment, the occurrence of various of Threats to Dependability(TtD), such as attacks, error and accidents is inevitable, which leads to deviations from the mission, operation interrupted, software failure, collapse of the crash, or even compromised or personal/property damages. Thus the TtD of distributed mission-critical system becomes increasingly serious. At the same time, the advancements in parallel computing, ubiquitous computing and mobile technology bring larger system scales, more complex architectures and growing diversity of targets, therefore the system management complexity has also become a great challenge. However, the traditional methods usually greatly increase the complexity of systems while improving dependability, thus they are not adaptable to the new situation and an new fine-grained approach of improving dependability for distributed mission-critical system is in great needs. In this case, for the characteristic of "technology ruled by technology", Autonomic Computing (AC) has the capability of self-management and its features of self-healing and self-protection contain dependability characteristics, so autonomic computing has been thought to be a new method to realize dependability and autonomic dependability is brought to public attention. But nowadays, the researches about autonomic dependability are still in their early stages, and most of autonomic dependability applications are rule-based, lacking of autonomic dependability model, specially a formal one. The status causes it hard to analyze the impacts from the key parameters in a global perspective, and hinders further developments.For the current problem of autonomic dependability, firstly a formal model of autonomic dependability for distributed mission-critical system is proposed in this paper, which can response to TtD in different levels and provide a theory basement for further study. On this basis, the core features of autonomic dependability, that is, self-reflection, self-healing and self-destruction are separately studied to analyze key factors to autonomic dependability, which can be used for refining and improving the model. The main contents are organized as follows. Firstly, a formal model of autonomic dependability for distributed mission-critical system is proposed based on SM-PEPA(Semi-Markov Performance Evaluation Algebra). The concepts and core features are studied at the beginning. And then an autonomic dependability model for distributed mission-critical system is built, in which the approaches of self-tolerance, self-healing and self-destruction are used to response to TtD in different levels under control of self-reflection. Furthermore, the model proposed is described by a formal language, SM-PEPA, which allows a rate of action following a general distribution. Based on this, a qualification method of autonomic computing is presented from a steady-state probability perspective, and autonomic dependability index is used as a metric to analyze the impacts of parameters on autonomic dependability. The autonomic dependability model is the base for subsequent chapters.Secondly, a layered self-reflection method for distributed mission-critical system is proposed. Accounting for existing research results of autonomic computing, the Autonomic Feedback Control loop(AFC) is designed, and the components are modified to be autonomic by adding the AFCs. Then we combine the context -awareness and self-awareness to set up a two-layered self-reflection architecture. In the architecture, the local self-reflection is helpful to decrease the cost of Autonomic Element(AE) in the self-reflection process, and the global self-reflection acts as a safeguard for the consistency of all parts. Due to the shortcoming of current model based on natural language or graph,π-calculus is used to describe the self-reflection architecture formally from higher and abstract level, reducing vulnerabilities. The self-reflection mechanism provides a precondition of self-management for self-healing and destruction in chapter 3 and 4.Thirdly, an analysis method of self-healing for distributed mission-critical system is proposed based on fluid-flow approximation of PEPA. According to the complexity of analyzing self-healing, a new analysis method of self-healing is built on the use of fluid-flow approximation of PEPA after studying requirements and representative architectures of self-healing for distributed mission-critical systems. In the method, we consider the process of self-reflection as well as the addition and deletion of components. Then PEPA could be converted to Ordinary Differential Equations (ODEs) to avoid state-space explosion confronted by Markov process currently. The experimental results show that comparing to traditional state-based methods, our work is with a good result by limiting the solution time in linear time, when there are a large number of component in the process of self-healing.Lastly, a self-destruction method for distributed mission-critical system inspired by apoptosis is proposed. According to the process of apoptosis in biological systems, a self-destruction structure in a combination of active and passive mode is designed for the software destruction of distributed mission-critical system on the basis of self-reflection. Based on this, MRSPN(Markov Regeneration Stochastic Petri Net) is used to modeling the process of self-destruction to provide a qualification analysis approach. This self-destruction method can be widely used as an ultra protection way for autonomic dependability. After analyzing the key parameters in the MRSPN model, simulation experiments prove that decreasing the failure of heartbeat and time used for destruction can improve the autonomic dependability.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络