节点文献

安全关键实时通信协议研究

The Research on Safety-Critical Real-time Communication Protocols

【作者】 李婵娟

【导师】 Nicholas McGuire; 周庆国;

【作者基本信息】 兰州大学 , 应用数学, 2011, 博士

【摘要】 在后PC时代,实时计算和通信技术已被广泛应用于航空航天、国防、交通运输、核电能源和医疗卫生等诸多安全关键系统中。随着安全关键系统呈现出分布式和-体化的新特性,通信网络已成为系统中的核心组件,并且对安全关键系统的构建和验证起着决定性的作用。一方面虽然已有大量的现场总线,但其开发过程中并没有融入安全的设计理念,通常都是在实现完成后,再去推导其设计的安全性。作为一种涌现特性(emergent property),安全性不可能在系统部署之后被添加进去,它必须贯穿于系统的整个开发过程中。因此,在安全关键应用中很少看到现场总线的身影。另一方面,尽管在安全关键领域,时间触发模型以及时间触发协议因其可预测性,时域可组合性以及容错能力而被认为广泛接受。但对其灵活性差,平均情况下资源利用率低等问题从未停止过研究,并提出了大量的改进协议,如Byteflight, FlexRay。这些协议虽然在一定程度上提高了协议的灵活性,但不仅削弱了协议的安全性,如容错能力,而且也没有触及时间驱动协议设计复杂,难于彻底进行形式化验证等安全相关问题。在本文中,以时间触发协议的基础全局时间为出发点,重新审视了协议的设计理念,提出了一个新的安全关键实时协议-NOP (Node Order Protocol)。它以节点顺序的理念构建分布式系统,并以此概念将现有协议划分为节点顺序和非节点顺序两大类。现有的节点顺序协议,如TTP (Time-triggered Protocol),将节点顺序的构建建立在全局时间的基础之上,而NOP协议设计的与现有的节点顺序协议正好背道而驰,它是以节点传输顺序为基础,并能够在其之上实现全局时间。因此,NOP协议消除了通信协议对全局时钟的依赖,建立了不依赖于全局时钟的仲裁过程与传输控制机制,以事件触发模式实现时间触发协议的传输语义。更为重要的是NOP协议在保持时间触发协议的安全性的同时,提高了协议的灵活性和资源利用率。本文在IEC61508框架下,将功能安全的安全管理理念贯穿于NOP协议的设计,实现和验证等开发环节,表现在风险管理与复杂性控制方面。在本文中,详细描述了NOP协议的语义设计,重点介绍了协议在事件触发模型下的差错检测和诊断的机制。不同于时间触发协议(如TTP)NOP协议并没有对节点潜在的故障模式及其应用环境做任何假设,而是完全依赖协议自身的能力进行检错和诊断,这样的设计不仅提高了协议自身的故障覆盖率,而且使得协议具有更好的可组合性。在协议设计的早期就引入了验证过程以保证开发生命周期的每一阶段的质量。在验证过程中,将传统的安全评估方法FMEA (Failure Modes and Effects Analysis)与模型检测形式化方法相结合,形成“动静”结合的验证体系。协议的性能的改善也是NOP协议设计的一个目标。对于协议性能的评估,从理论分析与原型测试两个方面进行分析。结果表明,协议延迟在网络负载达到峰值时存在确定的上限,且协议在此时表现出最优的性能,体现在确定的协议延迟和较小的延迟抖动。更值得指出的是,协议在网络负载达到峰值时取得最优的吞吐量,在lOMbps的以太网中,网络利用率可到95%。进一步证实了NOP协议,不同于传统的事件触发协议,是面向“最坏情况”而设计的,当节点有足够消息要发送时,能够使得网络达到饱和状态,这是现有的时间触发协议和事件触发协议所不可及的。在文中,分析了并定义了NOP协议在值域和时域上的接口,当节点能够以最坏情况下的时域行为组合时,NOP协议能够如时间触发协议一样实现时域可组合性。同时NOP协议的设计独立于底层的物理网络,其安全性完全由协议自身来实现,这有助于实现协议的可组合性,特别是与COTS (Commercial Off-the-Shelf)网络,如以太网,的可组合性。更为重要的是NOP协议虽然基于事件触发模型,但由于去除了时间相关的不确定因素,并实现协议状态转换的确定性,因此能够保证冗余单元之间的确定性(determinism)。因此,基于NOP协议能够实现主动冗余策略,如三模冗余系统。NOP协议采用新的设计思路将时间触发协议和事件触发协议的优势相结合,继承了时间触发协议的安全特性,并融合了事件触发协议的灵活性和资源利用的有效性。NOP协议的突出优点使其能够被应用于具有隐式节点顺序需求的安全关键应用中,成为取代灵活性差,带宽利用率低的时间触发协议的潜在选择。

【Abstract】 Nowadays, real-time computing and communication technology have been widely deployed in safety-critical systems, such as aerospace, defense, transportation, nu-clear power-stations and health-care. Time-triggered architecture (TTA) as well as time-triggered protocol (TTP) has been widely accepted in safety-critical systems, due to its predictability, temporal composability and replica determinism. However, the research on improvement of flexibility and average resource utilization of TTP is still ongoing, developing several new protocols such as Byteflight and FlexRay, which clearly shows the need for more flexibility as well as improved utilization to reach wider acceptance. Though these protocols improve the flexibility of TTP, they not only jeopardize the safety capabilities of TTP, but also do not address the safety-related issue on complexity of protocol design and formal validation. In addi-tion, TTP can not be independent of TTA deploying on the top of COTS networks, such as Ethernet, because its fault hypothesis depends on the system architecture. Such issue is remained in Byteflight and FlexRay.In this thesis, a new real-time communication protocol-Node Order Proto-col, is proposed. which is based on node order concept. Further, from the node order perspective, the existing protocols can be classified as two categories:non-node ordered and node ordered. The most important design principle of NOP is the inversion of the correlation between global time and the communication protocol layer. The existing node order protocols are all based on global time to establish the transmission order. While NOP design swaps these two layers, which makes the transmission order the basis of global time. Thus NOP establishes the arbitra-tion and the transmission control mechanism independent of global time, turning time-triggered communication semantics into event-triggered. And what is most important is that NOP can maintain the same safety capability of time-triggered protocols with better flexibility and resource utilization.In this thesis, the NOP design is described in detail, emphasizing on error detection and diagnosis mechanisms based on event-triggered model. Unlike time-triggered protocols, such as TTP. NOP detects and diagnoses errors by itself without the assumption on failure modes of a faulty node and on system model, which not only enhances its fault coverage but also improves its composability. In the early phase of the protocol design, the safety assessment is integrated to validate the correctness of protocol design and reduce the possibility of design fault. In this procedure, Failure Mode & Effect Analysis (FMEA) is complemental with model checking to form the static and dynamic analysis framework. The core of the pro-cedure is controlling the complexity and reducing it. In the case of performance evaluation, theoretic analysis and prototype testing are done in parallel. The anal-ysis results show that the worst case response time is bounded even at the peak network load when each node has a ready message with maximum size to be trans-mitted in its turn. And more important, in such scenario, protocol latency is with less jitter. Especially, network throughput reaches its maximum, which is about 9.5Mbps in 10Mbps Ethernet. This result further.confirms that NOP, different from the traditional event-triggered protocols, such as CAN, is designed for worst case. At the same time, it outperforms the time-triggered protocol by saturating the net-work if possible. The interface of NOP in the value domain and in the time domain is analyzed and derived from the worst-case latency. When the nodes can be inte-grated based on the worst-case temporal behavior, and then temporal composability can be supported by NOP. Moreover, NOP design is independent of the underlying network, which facilitates the capability of the composability with COTS networks, such as Ethernet. At the same time, NOP can guarantee replica determinism in event-triggered model by eliminating non-deterministic time-related factors. The consistent order of input messages and observed events (e.g. timeout events) guar-antees the consistent protocol status among correct nodes within a given interval. Thus, the active redundant system can be implemented on the top of NOP, such as Triple Redundancy Modular (TMR).NOP protocol uses a new design principle combining the features of time-triggered protocols and event-triggered protocols, which preserves the safety features with flexibility and effectiveness of resource utilization. The outstanding merits of NOP makes it a potential option to be used in the safety-critical system with node order requirement, which adopts a low flexibility or low network bandwidth time-triggered protocol at present, such as a TMR redundant system.

  • 【网络出版投稿人】 兰州大学
  • 【网络出版年期】2012年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络