节点文献

非相似容错计算机系统设计及其验证技术研究

Research on Techniques of Dissimilar Fault Tolerant Computer

【作者】 韩炜

【导师】 高德远;

【作者基本信息】 西北工业大学 , 计算机应用, 2002, 博士

【摘要】 容错计算机可分为相似余度和非相似余度两种形式。相似余度计算机是将一个版本的软件拷贝到各个相同的计算机中,由于软件是同一个版本的拷贝,而版本中的软件故障又难以避免,所以软件复制的同时,也复制了软件中的故障。在某种触发条件下,就可能导致所有余度都出现相同的错误结果。 NVFT是笔者主持研究的我国第一台非相似余度容错计算机系统。本文结合工程背景,在国内首次对采用相异性设计原理实现的非相似余度容错计算机(NVFT)所涉及到的技术及专题进行了系统的研究和探索。所做的研究工作包括:非相似余度计算机系统设计、硬件设计、软件设计,非相似余度计算机的同步/异步设计、通讯设计、软件版本之间交叉表决点、交叉表决向量及监控表决算法的设计研究。最后进行了NVFT的可靠性分析工作,介绍了在验证和测试容错系统过程中非常重要的故障注入器和飞行试验。 NVFT设计采用的相异性设计基本原理是设置多个不同的设计小组,按照同一个规范,分别独立地进行各个余度软硬件的设计、开发、测试和验证。 一般的相异性设计有两种方式实现,即N版本编程和恢复块技术。本文在多机容错系统结构基础上,独到地提出一种新的NR容错计算机结构,并进行了一定的分析,在NVFT中进行了初步的尝试。 规范设计重要的是保证其正确性和完整性。在非相似余度计算机的规范中,需要明确规定实现相异性编程所需的交叉表决点、交叉表决向量等要求。 非相似容错系统运行方式可以为同步方式,也可以是异步运行方式。本文详细地分析了这两种运行方式的特点。本文提出了一种时标机制,完成了在异步条件下通道之间的输入变量的比较和监控。 交叉表决是非相似余度计算机系统的关键设计之一,例如交叉表决点和表决向量的设置。交叉表决设计体现在我们开发的面向飞行控制的软件中。 对非相似容错系统的可靠性和性能分析是本文研究的中的重点之一,本论文提出了用故障树和马尔可夫过程对NVFT系统的可靠性进行了分析的思路。根据我们试验的数据,得到了NVFT系统的可靠性指标。 故障注入是研究容错系统的一个有力工具。软件故障注入是比较难于实现。本文提出了一种新的思路,在硬件故障注入器基础上,实现软件的故障注入。这对于实现容错软件的验证有着重要的意义。

【Abstract】 The fault tolerant computer can be divided into two categories: similar and dissimilar redundant computer. The similar redundant computer runs the same copy of software on each redundant computer. Because each one uses the same copy of software and the errors within the software is very hard to be avoided, the same error is copied into each computer. If an error were accidentally trigged under some conditions, all the software resided in computers would have the same fault at the same time.NVFT (N Version Fault Tolerance) is the first dissimilar computer system in China that directed by the author. Based on the engineering practice, this paper gives the summary of this project concerning with its principle and research. All works include: dissimilar redundancy computer architecture, hardware and software design, synchronization and asynchronization, communication design, software cross check point and voting vector design and the software voting and monitoring algorithm design. Also a software reliability analysis using our experimental data is presented. The conception and methods of fault insertion and flight test of the dissimilar computer are described.The diversity design principle is to use multiply design teams to independently design individual redundant computer according to the same specification. Combining with the advantages of N-version programming and recovery block, this paper presents a new fault tolerant structure — NR fault tolerant computer architecture. The analysis and implementation are also given in this paper.There are several key points to develop a N-version programming software redundancy system:1. The correctness, completeness, well-defined requirements are needed in the system specification. The specification should specify the crosscheck points and cross check vectors in detail.2. The fault tolerance system can be based on synchronization or asynchronization styles. For restraining of common noise signal, asynchronization style is a better one. This paper gives a time-tag mechanism to monitor input signals.3. In the control law design of the project, the three design teams use different algorithms to implement the specification of the crosscheck points and voting vectors.4. The software reliability analysis is an infinitude subject. This paper uses fault tree and Markov chain based on real experiment data to analyze the reliability of NVFT system.5. For the verification of a fault tolerant system, a fault inserter is developed in our project, and a new idea of the software fault injection is presented. This is very important for the system demonstration.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络