节点文献

嵌入式多核处理器在线追踪调试与错误检测关键技术研究

Key Techniques of On-chip Trace Debug and Fault Detection for Embedded Multi-core Processor

【作者】 扈啸

【导师】 陈书明;

【作者基本信息】 国防科学技术大学 , 微电子学与固体电子学, 2007, 博士

【摘要】 随着嵌入式系统产业的蓬勃发展,高效的开发调试工具越来越受到关注。片上在线追踪(片上trace)调试技术通过专用硬件非入侵地实时记录处理器的运行信息,具有可信度高、无需代码改动和不影响系统运行状态等优点,可有效解决当前高集成度和高实时性嵌入式系统的调试困难,因此成为近几年来嵌入式调试技术的重要研究方向。在航天和军事等应用领域中,处理器的可靠性成为至关重要的问题,因此需要在线容错机制处理各类硬件故障。错误检测是容错的首要步骤,满足嵌入式系统对成本和功耗的要求,研究低硬件开销和低性能损失的错误检测方法具有重要意义。本文针对片上trace调试技术和错误检测技术展开研究。首先讨论了嵌入式处理器的调试模型,对片上trace技术的原理、优势和实现模型进行了深入分析,而后对片上信息采集压缩、trace数据流的片上传输结构和trace辅助的调试调优应用等几个方面进行了深入研究,并在上述研究工作的基础上,设计实现了一个多核片上trace调试系统,验证了本文的研究结论。本文还提出了一种控制流错误检测方法,满足了嵌入式处理器低开销的容错需求。本文取得的主要研究成果如下:(1)在片上trace信息的采集压缩方面提出旨在提高压缩率和灵活性的改进和创新:设计了有效压缩条件分支消息的长短串编码方式;设计了分支输出配置位,用于控制采集内容与输出数据量之间的灵活折中;设置了有效辅助程序调优的事件trace,提供了在精度和数据量之间灵活折中的编码方式;设计了对trace功能进行非入侵配置访问的NOP config指令。(2)根据多核trace数据流传输的特点,提出一种基于服务请求门限和最小服务粒度双重约束的懒惰队列调度算法。该算法通过设置各队列的服务请求门限控制队长分布,通过设置最小服务粒度和懒惰服务切换减少队列切换开销。实验结果表明,该算法能够按设置的队列优先级充分利用缓冲容量,有效降低各缓冲队列的溢出。该算法的实现代价合理,具有良好的可扩展性。(3)扩展了片上trace技术的应用领域,提出一种将代码排布技术与指令预取技术结合使用的方法。由片上trace非入侵地获得带有时间信息的程序执行路径,利用程序运行的周期行为特性设置预取,以增加预取容限为目标进行函数级的代码排布,并利用VLIW的空闲单元执行预取指令。实验结果表明,同单独实施的指令预取或代码排布相比,该方法能更有效地减少指令Cache失效。(4)针对嵌入式系统低开销的容错需求,结合VLIW结构处理器的特点,提出一种基于特征值监督的控制流错误检测方法。设计了弱位置约束的特征值指令,允许在一定范围内寻找空闲指令槽或NOP指令位置来执行特征值指令,由此减小了处理器的性能损失和代码长度开销。设计了动态特征值修正指令,可根据分支寄存器的内容动态修正预期特征值,相比硬件方法扩大了故障检测范围,相比软件方法减小了性能损失。该方法可以检测15种控制流错误和指令码的位翻转错误,具有较高的故障覆盖率和较小的硬件开销。(5)建立了基于存储元件状态集合的嵌入式处理器调试模型,对片上trace的工作机理、内在优势和实现模型进行了深入分析和讨论。实现了一套较为完善的片上trace调试系统,通过对多核程序的实例研究,该系统可有效辅助调试和调优。

【Abstract】 With the flourishing development of the embedded system industry, more and more attentions are paid to efficient development and debugging tools. On-chip trace technique records run-time information of the processor with dedicated hardware non-intrusively. Without code changes, on-chip trace has the advantage of high reliability and does not affect the run-time behavior of the system. It is able to overcome the debug obstacle of current embedded systems with high level of integration and high real-time requirements. Therefore on-chip debug becomes an important research aspect of debug technology in recent years.In aerospace and military applications, the reliability of processors becomes a critical issue. Therefore fault-tolerant mechanisms are required to deal with hardware failures. Error detection is the first step in the fault tolerance. Meeting the cost and power requirements of embedded systems, the study on the error detection with low hardware cost and low performance loss is of great significance.In this dissertation, on-chip trace debug and on-line error detection are studied. We first discuss the debug model of embedded processors, and give an in-depth analysis on the principles, advantages and realizations of on-chip trace technique. And then we study three aspects: the collection and compression of run-time information, the transmission structure of trace data and the debugging and optimization assisted by trace. A multi-core on-chip trace system is founded to verify the conclusions mentioned above. This dissertation also presents a control-flow error detection method for the low-cost requirements of fault tolerance of embedded processors.Primary innovative works in this dissertation can be summarized as follows:(1) We propose several innovations and improvements on trace information collection and compression to improve the compression ratio and flexibility. A Long and Short Chart Encoder is presented that can compress the trace data of conditional branches effectively. Configuration bits of branching output are designed that can achieve a flexible tradeoff between trace contents and data volumes. Event trace is implemented that can assist optimization effectively, and the encoding method for event trace can achieve a good tradeoff between accuracies and volumes. We also present a NOP_config instruction that can configurate the on-chip trace hardware non-intrusively.(2) Scheduling for combining the traffic of multi-source trace data is one of key issues that affect the performance of trace data transmission. By analyzing the features of trace traffic combination, a lazy scheduling algorithm based on the service threshold and the minimum service granularity is proposed. The queue length distribution is constrained by configurable service threshold of each queue, and switching overheads are reduced by lazy scheduling and configurable minimum service granularity. Simulation results show that the algorithm controls the overflow rate of each queue effectively and utilizes the buffer capacity sufficiently according to the queues priority assigned. The algorithm has good scalability with reasonable hardware costs.(3) Expanding the application of on-chip trace debug, a method combining code layout and instruction prefetching is proposed. The program execution path with timestamps is offered by on-chip trace non-intrusively. By exploring the phase behavior of program execution path, code layout is executed for maximizing the prefetching intervals, and prefetching operations are executed by unoccupied function units in the VLIW architecture. Simulation results show that instruction cache misses are much reduced compared with code layout or instruction prefetch implemented alone.(4) Focusing on low-cost fault detection for embedded systems and characters of the VLIW architecture, a hybrid control flow checking method (V-CFC) by monitoring signatures is proposed. Signature instructions with weak position constraint are designed to offer the redundant control flow information, and these instructions can be executed in unoccupied instruction slots or in positions of NOP instructions to minimize the overhead on processor performance and program code size. Dynamic Offset Signature Instructions are proposed to offset the expected signature by accessing branch registers in run time, therefore V-CFC expand the error detection scope compared with other hardware methods and reduce the performance loss compared with a software method. V-CFC is able to check bit flips of instruction codes and 15 types of execution sequences control flow errors with the high error-detection coverage, low performance loss and low hardware costs.(5) A debug model of embedded processors is founded with the state set of storage elements, and an in-depth analysis is given on the principles, inherent advantages and realization models of on-chip trace. An on-chip trace system is founded on the platform of a multi-core processor YHFT-QDSP. A case study to a multi-core program shows, the system is able to assist debugging and optimization effectively.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络