节点文献

基于时序等价性检查的电路软错误系统级可靠性分析方法研究

Research on Sequential Equivalence Checking Based System-level Soft Error Reliability Analysis of Circuits

【作者】 朱丹

【导师】 李思昆;

【作者基本信息】 国防科学技术大学 , 电子科学与技术, 2010, 博士

【摘要】 随着集成电路设计与制造工艺的飞速发展,空间辐射和噪声干扰等环境因素引发的软错误严重威胁设计可靠性。为了同时满足性能、功耗、面积以及可靠性等多种设计目标需求,通常只能对电路进行有选择性的软错误保护。软错误可靠性分析是决定有选择性保护效果的关键。电路系统级软错误可靠性分析在设计的早期展开,不仅可以获得更高的分析效率,而且能够更早地为容错设计提供指导,避免设计返工,是工业界和学术界共同关注的研究热点。已有的适用于数字电路系统级可靠性分析的方法主要有两类,即基于故障模拟的方法和基于形式化技术的方法。基于故障模拟的方法应用最广泛,但是,这类方法很难实现输入空间和故障空间的完全覆盖,是不完备的。基于形式化技术的方法虽然可以保证分析结果的完备性,但是已有的方法主要基于模型检验和定理证明,都需要较多的经验和专家支持,而且定理证明还需要手工干预。与其他的形式化验证技术相比,时序等价性检查具有原理简单、易于理解和使用的优点。因此,本文将时序等价性检查技术引入到系统级软错误可靠性分析领域,深入研究了基于时序等价性检查的电路软错误系统级可靠性分析理论和方法,取得了如下创新成果:1.提出一种基于错误传播模型和时序等价性检查的软错误敏感点筛选方法。该方法首先从电路中提取软错误的传播行为模型,然后基于该模型对故障电路与原电路进行等价性检查,识别对软错误敏感的时序单元。实验结果表明,提出的方法不仅可以精确筛选出电路中所有的软错误敏感点,还可以检测容错逻辑的有效性。2.首次证明了一般电路的软错误免疫力主要来源于对软错误部分免疫的结点;并证明了电路及其组件的可靠性不仅随电路的输入分布变化而变化,而且随时间动态变化。从而为研究能够充分利用电路自身免疫力有效指导软错误保护的可靠性分析新方法提供理论依据。3.提出一种运行时时序单元软错误可靠性排序方法和一种近似的时序单元软错误可靠性排序方法。其中,运行时可靠性排序方法能够根据输入分布和初始状态分布精确地离线预测电路中时序单元的运行时软错误可靠性排序。而近似的可靠性排序方法能够在输入分布未知的情况下,为工程师提供关于容错设计方面的初步指导。实验结果表明,两种方法都能够为电路的有选择性保护提供有效指导;且在相同的容错代价下,基于运行时方法的指导可以获得更高的可靠性;而近似的方法可以分析规模更大的电路。4.提出一种基于二维分解的高层时序等价性检查方法。二维是指空间维与时间维。首先利用切片技术对验证对象进行空间维分解,然后在对切片进行等价性检查的过程中动态插入逻辑割点,实现时间维分解。实验表明,该方法能够有效地缓解存储空间爆炸问题。5.设计实现了一个基于时序等价性检查的电路系统级软错误可靠性分析框架SEC-HSERA(Sequential Equivalence Checking based Hybrid Soft Error Reliability Analyzer)。该框架集成了本文提出的软错误敏感点筛选方法、运行时可靠性分析方法、近似的可靠性分析方法以及面向时序等价性检查的二维分解指导模块。将SEC-HSERA原型系统应用于32位嵌入式微处理器Estar2中译码器电路的软错误可靠性分析,并基于分析结果,对译码器的时序单元进行有选择性的软错误保护,最终以22.5%的功耗损失和0.59%的面积损失获得90.4%的错误覆盖率。

【Abstract】 As technology scales, soft errors induced by various environment problems, such as cosmic radiation and random noise, threaten the system reliability severely. To reach the design goals of reliability, performance, power consumption and area simultaneously, the design can only be protected selectively. For selective protection, reliability evaluation is critical. Circuit’s system-level soft error reliability analysis is carried out early in design process, which is more efficient and can provide earlier guidance for soft error tolerance design to avoid reworking. System-level soft error reliability analysis has been the common research focus of the industrial circle and the academy circle.Most of the existing approaches for system-level reliability evaluation of circuits can be classified into two categories: simulation based approaches and formal techniques based approaches. Though simulation based approaches are used most widely, they are incomplete since they cannot cover the input space and fault space completely. Formal techniques based approaches are complete, but most of the conventional approaches based on formal techniques are based on property checking and theorem proving, which both require experience and support from experts. Moreover, theorem proving needs manual intervention.With the advantages in simplicity, ease-of-understanding and ease-of-use out of other formal verification techniques, sequential equivalence checking (SEC) is introduced into the area of circuit’s system-level reliability evaluation in this thesis, which focuses on the theory and approaches of SEC-based system-level reliability evaluation. The major innovative achievements are listed as follows:1. A fault propagation characteristics and SEC guided soft error reliability evaluation approach is proposed. For scalability, fault propagation sequential dependence graph (SDG) is advanced to extract the soft error propagation characteristics. In this approach, equivalence checking is localized in the circuit parts affected by soft error propagation. Experimental results indicate that the propose approach not only can locate all the soft error vulnerable spots, but also can check the effectiveness of the protection logics in circuits.2. It is theoretically proved that the circuit’s intrinsic immunity to soft errors mainly stems from the circuit components with partial immunity, and that the soft error reliability of a circuit at runtime varies with the input distribution as well as time. These conclusions provide theoretical basis for the research on reliability evaluation approaches which can make full use of circuits’intrinsic immunity to guide selective soft error hardening.3. A run-time soft error reliability sorting approach and an approximated soft error reliability sorting approach are proposed. According to the input distribution and the initial state distribution, the run-time approach can exactly predict the runtime reliability sorting of sequential units by offline analysis, while the approximate approach can help engineers make preliminary vulnerability estimation to make necessary tradeoff between reliability and other design targets even when a good workload estimation of the design is unavailable. Experimental results show that both of the two approaches can guide the deployment of soft error protection mechanisms efficiently, and the runtime approach can achieve higher reliability than the approximate one at the same cost while the approximate approach can analyze lager-scale circuits.4. A two dimensional (2D) decomposition SEC approach is advanced. Here,“2D”means the space dimension and the time dimension. The approach builds verification model for the slice of a single output variable every time first. And during the equivalence checking of the corresponding slices, logic cutpoints are dynamically inserted to split the verification problem in the time dimension. Promising experimental results demonstrate that this approach can reduce the storage explosion of SEC.5. A SEC based Hybrid Soft Error Reliability Analyzer (SEC-SERA) is designed and developed, which integrates all the proposed soft error reliability evaluation approaches including the soft error vulnerable spots selection approach, the runtime soft error reliability sorting approach, the approximate reliability sorting approach, as well as the 2D decomposition SEC approach. The prototype system of SEC-HSERA is applied to analyze the reliability of the decoder in a 32bit embedded microprocessor. Guided by the result from SEC-HSERA, 90.4% error coverage was achieved with extra 22.5% power consumption and 0.59% area.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络