节点文献

高可靠8051设计与实现及可靠性评估

The Design and Implementation of High-Reliable 8051 and Reliability Estimation

【作者】 赖鑫

【导师】 王苏峰;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2008, 硕士

【摘要】 近20年来,随着计算机技术的广泛应用,许多应用场合都要求计算机必须长期稳定、可靠地运行,作为计算机系统核心的微处理器的可靠性因此受到广泛的关注。辐射和电磁干扰是目前造成的微处理器失效的主要原因,其造成的单粒子效应对于微处理器可靠性的影响是当前高可靠微处理器设计技术研究领域关注的焦点。单粒子效应中的单粒子翻转(SEU)现象不会损坏逻辑电路,但可改变逻辑电路中信号的状态,从而造成电路工作紊乱,引发故障。SEU具有偶然性、突发性和随机性,因而成为目前高可靠微处理器抗单粒子效应设计中主要防护的对象。单粒子翻转(SEU)会引起微处理器功能单元不同故障,会导致处理器的不同失效情况。微处理器的不同功能单元其工作机理也不一样,因此有不同的可靠性增强技术对它们进行可靠性增强。首先,本文分析了单粒子效应产生的环境、产生机理。然后论述了单粒子效应对于微处理器的影响,特别是对时序电路和组合电路的影响。微处理器中的寄存器在受到单粒子翻转(SEU)事件时容易发生故障,三模冗余技术可对其进行加固。传统的三模冗余三路寄存器会在同一时刻采样故障值从而导致寄存器出现故障。本文将增强型时空三模冗余技术用于对寄存器进行可靠性增强,从而在提高时序电路可靠性的同时增强了组合电路的容错性能。增强型时空三模冗余技术结合了时间冗余和空间冗余,是在用于对非反馈型电路可靠性增强普通时空三模冗余技术的基础上结合加固反馈型电路的带双沿触发寄存器进行改进的。针对微处理器的ALU运算单元,在HR8051中增加了Berger码检测器对其运算过程进行监控。Berger码检测器利用算术运算各种函数映射关系来检测运算过程是否有误。针对存储器和寄存器文件增加了EDAC检错纠错器对其读写过程进行检错和纠错。控制流检测与现场保存和恢复用于对控制单元进行可靠性增强实现。安全状态机用于对MDU运算控制的状态机进行可靠性增强。在实现的可靠性增强微处理器HR8051基础上进行故障注入,分析其在故障存在条件下的行为和各可靠性增强技术的效果。故障注入的结果显示,时空三模冗余技术在故障持续时间不大于三路时钟的相位差的情况下,可以很好的屏蔽组合逻辑和时钟线的单粒子翻转(SEU)事件。同时结果表明,适当的增加三路时钟相位差可以提高时空三模冗余技术的效果,但有个最佳值。随后可靠性增强效果呈下降趋势;当故障持续时间大于三路时钟相位差时使两路时钟同时采样到故障值,在反馈型电路会导致长时间的故障状态。最后介绍将SystemVerilog断言机制应用于故障检验,结合故障注入从系统级检验可靠性增强技术对电路可靠性的影响。Markov分析方法结合故障注入的结果和HR8051的具体实现,对HR8051在单粒子翻(SEU)事件的攻击下的行为进行部分假设和简化来分析了热备份系统的可靠性行为。

【Abstract】 In recent years, there has been a rapid increase in the use of computer systems. Most applications require computer systems to work steadily and reliably. This trend has led to critical concerns with the validation of the reliability of the microprocessor, which is the heart of the computer system. Radiation and electromagnetic interference are two typical causes of microprocessor faults. The interference of Single Event Element (SEE) caused by radiation and electromagnetic interference is the focus of current high reliable microprocessor design techniques.Single Event Upset (SEU) phenomenon of SEE will not damage the circuit of the processor, but it can change the logic state of the circuit. As a result, the circuit will work incorrectly and failures will be brought in. SEU is a transient effect and occurs randomly, so it has become the main concern in SEE mitigation techniques of the high reliable microprocessor design. SEU can result in different kinds of fault and lead to microprocessor malfunction. Different function unit in microprocessor has different operational principle, and there are different kinds of Reliability-improving technologies to improve their reliability.First, this paper analyzes how SEU event arises and its environment, and then analyzes how it can affect microprocessor, especially temporal logic circuit and combinational logic circuit.Registers in microprocessor are easily to malfunction under SEU attack. Triple Modular Redundancy (TMR) technique can improve their reliability. But traditional Triple Modular Redundancy (TMR) technique will sample the same fault value at the same time and make registers fault. Enhanced ST-TMR (EST-TMR) is use to improve fault tolerance of both the combinational logic circuit and sequential logic circuit, which enhance the Space-Time TMR (ST-TMR) technique with double edge triggered registers.For ALU, a Berger code detector is added in microprocessor to monitor its operation. Berger code detector use the internal function mapping relationship in different arithmetic and logic operation to detect err during operation. EDAC error detector and corrector have been implemented to improve the reliability of register file and memory during read and write process. Control flow check and context saving restoring is use to harden the control unit. Also safe state machine has been added to harden state register of Multiply and Division Unit (MDU) for operation control.Fault interjection has been applied to check how much the effect of reliability-improving technologies can achieve and the way that fault affects microprocessor operation. Results indicate that when fault duration is shorter than phase difference of three clocks, enhanced ST-TMR can almost mask the SEU in combinational logic circuit and clock line. Slightly enlarge the phase difference can improve effect, but has optimum value. If phase difference is larger optimum value, it will degrade the effect. When fault duration is longer than phase difference will result in fault sampling by two registers and make feedback circuit in fault state for a long time.At last, this paper applies SystemVerilog assertions to fault checking, working with fault interjection to check the effects of reliability-improving technologies applied to circuits. Markov analysis method working with the fault interjection results and implementation of HR8051, analyzes the hot backup system behavior at some assumption which is made to simplify the fault events analysis.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络