节点文献

纳米集成电路软错误分析与缓解技术研究

Research on Techniques of Soft Error Analysis and Mitigation in Nanometer Scale Integrated Circuits

【作者】 孙岩

【导师】 郝跃; 张民选;

【作者基本信息】 国防科学技术大学 , 电子科学与技术, 2010, 博士

【摘要】 随着微电子制造工艺的发展,集成电路的特征尺寸逐渐缩小,目前已经进入纳米时代。纳米集成电路具有出色的性能、强大的功能和较低的功耗,广泛应用于电子通信、计算机、航空、航天、军事和消费电子等设备中。然而,纳米集成电路中不断降低的电源电压、越来越高的工作频率、持续减小的节点电容和高速增长的芯片复杂度使得电路对环境的影响越来越敏感。当电路受到高能粒子及外界噪声等干扰时,这些因素可能破坏芯片的内部状态,使集成电路发生错误。由于这种错误具有瞬态、随机和可恢复的特点,因此被称为“软错误”。在纳米级工艺条件下,软错误是引起集成电路失效的主要原因。频繁发生的软错误将导致集成电路的可靠性降低,严重影响了系统的稳定性。随着工艺的进步,纳米集成电路中的软错误问题越来越突出,可能引起数据误差和运行错误,甚至导致整个系统崩溃。特别对于应用在航空、航天和军事等设备中的集成电路,由于所处环境复杂、可靠性要求很高,软错误问题更是设计中最严峻的挑战之一,需要采取特别措施进行保护。本文面向纳米级工艺,深入研究了集成电路中软错误的分析和缓解技术。研究对象主要包括对集成电路整体软错误率影响较大且难以保护,或保护开销较大、成本较高的组成部件或电路结构。研究目的并不是要完全消除纳米集成电路中的软错误,而是希望以较低的代价尽可能缓解软错误对集成电路可靠性的影响,强调高效率、低开销的保护技术。本文针对以上内容展开研究,主要取得以下研究成果:一、相联存储器软错误分析与缓解技术研究。相联存储器是集成电路中对软错误最敏感的部件之一,但是其结构特点决定了不能使用错误保护码等传统容错方法进行保护。论文分析了相联存储器中软错误发生的机理和概貌,研究了将稳定结构应用于相联存储器单元的加固方法,并提出两种基于双单元冗余的可靠单元结构——双单元反馈和双单元保持可靠CAM单元,以及一种基于双单元结构的错误保护机制——抛弃机制。针对相联存储器单元加固的限制,论文还提出一种基于三值匹配线的容软错误相联存储器结构,能够检测相联存储器中的任意一位错误,结构简单高效。实验表明,所提出的技术能够以较小的开销有效地缓解相联存储器中的软错误问题。二、高效的时序电路软错误缓解技术研究。时序电路是集成电路的重要组成部分。在纳米级工艺下,时序电路对软错误变得非常敏感。传统基于冗余的加固方法虽然可以解决时序电路的软错误问题,但是加固效率较低,面积开销偏大。论文对不同结构寄存器的可靠性进行分析和比较,并设计了一种低开销的可靠动态寄存器DMTS-DR。论文还提出一种基于贪婪算法的时序电路敏感寄存器替换技术,只需将对电路软错误率影响最大的部分敏感寄存器替换为冗余寄存器,就能够有效降低电路的软错误率。针对贪婪算法有时不能达到可靠性和开销整体最优的局限,进一步提出可靠性-开销最优的启发式替换算法。实验表明,所提出的技术在可靠性和面积开销间达到了很好的折中,效率较高。三、动态电路软错误易感性分析与优化技术研究。动态电路具有突出的性能优势,广泛应用于高速电路的设计中。然而动态电路对软错误非常敏感,严重限制了其应用。目前针对动态电路软错误问题的研究较少。论文分析了动态电路的软错误敏感性,并对软错误敏感性最高的几种情况推导出临界电荷的解析分析模型。针对该分析模型计算复杂性较大的问题,论文对模型进行适当近似,得到线性的简化模型,不仅可以用于手工分析动态电路的软错误易感性,还可降低自动化分析工具的计算量。基于简化分析模型的指导,论文提出五种动态电路软错误易感性优化技术,并进行分析和比较。实验表明,所提出的模型达到了较高的精度,优化技术在一定程度上缓解了动态电路软错误率较高的问题。四、低开销的运算单元软错误缓解技术研究。运算单元是微处理器和集成电路中的核心元件,其正确性直接影响系统的可靠性。在纳米级工艺下,运算单元中的软错误将不可忽视。传统的运算单元软错误保护方法会带来较大的开销。针对这个问题,论文提出利用运算单元中大量存在的固有冗余资源进行容错而降低开销的方法,并以一个并行加法器为例研究具体的实现技术。论文提出了一种开发加法器中固有硬件和时间冗余资源用于容软错误的并行加法器结构STPA,并针对该结构研究了一种基于故障注入的高效软错误评估方法。实验表明,所提出的技术充分开发了电路的固有冗余资源,能够较好地缓解运算单元中的软错误问题,开销很小。以上研究成果为分析和缓解纳米集成电路的软错误问题探索了一些可行的方法,为进一步提高纳米集成电路的可靠性提供了理论和实践基础。

【Abstract】 With the technology scaling, the microelectronic manufacturing process has entered nanometer era. Because of its excellent performance and low power consumption, nanometer scale integrated circuits have been widely used in electronic communications, computer, aerospace, military and consumer electronics devices. On the other hand, due to the great reduction in supply, increasing frequency, continuous decreasing of node capacitances and the rapid growing of chip complexity, nanometer scale integrated circuits become more and more sensitive to the environment. When hit by high-energy particles or interfered by external noise, the nano-integrated circuits’internal state can be destroyed and this can further lead to the circuit error. Since this type of error is transient, random and recoverable, it is called“Soft Error”.In nanometer technology, soft errors are the mainly reason for integrated circuits failure. Frequent occurrences of soft errors will lead to an unstable system and seriously affect the reliability of integrated circuits. To make things worse, soft error becomes more and more seriously as the technology scales down and can cause many problems, such as date corruption, execution error, and even system crash in the worst case. So it should be paid to enough attention, especially in the cases of aviation, aerospace and military which demands high reliability because of their poor environment. Nowadays, soft error becomes one of the most serious challenges which require particularly consideration.This thesis mainly studies techniques of soft error analysis and mitigation in nanometer scale integrated circuits. This study primarily focuses on units or structures which are difficult or costly to protect and have a significant influence on the system soft error rate. The purpose of this thesis is not to completely eliminate soft errors in nanometer scale integrated circuits, but to develop a lower cost mitigation technique and enhance the system’s reliability, and to emphasize high efficiency and low overheads. The innovations of this thesis are as follows:1. Soft error analysis and mitigation techniques of content addressable memories (CAM). In the integrated circuits, CAM is one of the most sensitive units to soft errors, but the traditional fault-tolerant technique, such as error-correcting coding techniques, can’t apply to it due to its special structure. To address the problem, after analyzing soft error mechanism and profile of CAMs, this thesis introduces a method that uses stable structure to enhance CAM cells. Furthermore, this thesis also introduces two reliable CAM cells and a soft error protection mechanism. The two structures, Dual Cell Feedback Reliable CAM and Dual Cell Keeping Reliable CAM, are based on dual cell redundancy. The mechanism called Ignore Mechanism is an fault protection scheme based on dual cell CAM structure. Due to the limitation of the CAM cell hardening, this thesis proposed a simple but effective soft error tolerant structure which is based on the triple-value match line and can detect any one bit error in the CAM. Experiment results show that the proposed techniques can effectively mitigate the soft error problem at the cost of very low overheads.2. Cost effective soft error mitigation techniques of sequential circuits. Sequential circuits are an important component of the integrated circuits. In the nano-technology, the sequential circuit becomes very sensitive to soft errors. Although the traditional redundancy-based enhancing technique can migrate the soft error effectively, it brings considerable area overhead as well. This thesis firstly investigated the reliabilities of different register structures, and then designed a soft error immunity register, DMTS-DR. To mitigate the circuits’soft errors, a greedy-based sensitive register replacement algorithm is proposed. Its main idea is to replace the most sensitive registers with redundancy structures. Since the greedy algorithm is sometimes a sub-optimal solution, this thesis proposed another heuristic algorithm. Experimental results prove that the proposed techniques achieve a good trade-off between reliability and area overhead.3. Soft error analysis and vulnerability optimization techniques of dynamic circuits. Dynamic circuits, owing to its outstanding performance, are widely used in high-speed circuit design. However, dynamic circuits are very sensitive to soft errors, and this severely limits its application. This thesis analyzes the soft error sensitivity of dynamic circuits, and develops an critical charge analytical model for some of the highest vulnerable cases. Due to the accurate analytical model is too complex to calculate, this thesis, by using an approximate method, finally present another simplified linear model which can be used both in the vulnerability analyzing and automatic CAD tools. Based on the analysis model above, five techniques are designed to mitigate soft error vulnerability of the dynamic circuits and each of them has been evaluated carefully. Experimental results demonstrate that the proposed model has high accuracy, and the optimization techniques can mitigate soft errors in dynamic circuits.4. Low overheads soft error mitigation technique for arithmetic units. Arithmetic unit is the most important component both in the microprocessors and integrated circuits, and its validity may directly affects the system’s reliability. In nanometer technology, the soft errors in the arithmetic unit can’t be ignored. Since the tradition mitigation technique’s area overhead is too large, this thesis introduces the idea of exploiting the inherent redundancy resources to mitigate the soft errors. After investigating the inherent redundancy in the parallel adders, a soft error tolerant adder STPA was designed. An effective fault injection-based soft error estimation method is also proposed for the proposed adder. Experiments prove that the proposed structures sufficiently exploit the circuits’inherent reluctant resources and can effectively mitigate the soft errors problem of the arithmetic unit while only with small overheads. To summary, this thesis provides an effective solution for the soft error problems in the nanometer scale circuits, and further gives a theory and practicable foundation to improve the reliability of the nanometer scale circuits.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络