

Research on Media Enhanced Digital Signal Processor Core Design for System-On-Chip

【作者】 李东晓

【导师】 姚庆栋;

【作者基本信息】 浙江大学 , 通信与信息系统, 2004, 博士

【摘要】 媒体系统芯片设计开发中的关键问题,就是如何最有效地利用硅片上可用的硬件资源,提供支持目标多媒体应用的单芯片高性加比解决方案。基于总线互连的由一个或多个指令集处理器核、一个或多个专用硬件IP核、一片或多片片上存储器构成的异质体系结构成为媒体系统芯片的合理选择。在国家863计划的支持下,我们开展了系统芯片中媒体增强的数字信号处理器核的设计研究,本文作为部分成果,着重探讨了处理器核指令集结构的媒体处理增强、处理器核微结构的设计和优化以及系统总线设计和媒体数据流调度的问题。 在系统芯片中媒体数字信号处理器核的设计中,在分析媒体处理应用算法特点的基础上,本文提出对MIPS-Ⅰ指令体系相兼容的基本指令集结构进行媒体增强扩展,通过支持SIMD亚字并行操作、媒体专用指令和运算结果特殊处理等增强单发射结构处理器的媒体处理性能,借鉴Intel MMX/SSE/SSE2媒体扩展指令集的思想生成初始指令功能集合,通过与常用媒体处理核心算法的互动进一步优化媒体指令集结构,创新性地对与MIPS-Ⅰ相兼容的基本指令体系实现了后向兼容媒体增强扩展,在硬件上通过构造可拆分的数据通道等实现了对媒体增强指令集的支持,以极小的硬件附加开销获得了媒体处理性能的显著提高。 在系统芯片中媒体数字信号处理器核的设计中,在具体分析CPU流水线竞争和处理器异常的基础上,本文提出并实现了一种基于有限状态机的流水线运行控制方案,并从提高钟频和降低CPI值两个方面优化处理器性能。为避免流水时钟频率受制于某些复杂运算指令较长的运算时间,又要达到单周期完成一条运算指令的吞吐量指标,本文提出对EX级进行可伸缩超流水扩展的思想,提出并实现了一种高性加比的切换控制方案。对于单发射结构的处理器,降低CPI值的根本途径在于通过各种软硬件技术减少流水线的停顿,本文构造了一个RAW相关环路模型用于分析流水线中寄存器操作数的RAW竞争现象,并提出了一种“动态”数据旁路优化策略,可以最大程度地减少复杂流水线中因数据的RAW竞争而导致的互锁停顿,理论分析和实测结果充分表明“动态”数据旁路机构可以有效地降低流水线因RAW互锁导致的平均CPI增量。 总线设计和媒体数据流的调度是实时媒体系统芯片设计中极其关键的问题,本文以VCD、HDTV解码系统芯片的设计为具体个例,探讨了MPEG-1/2视频解码软硬件实现中的数据流调度策略。在分析视频码流输入、解码处理和视频显示的时间参数后,提出以3帧组合为软件解码的调度粒度,以及基于启动期限和完成期限的两种解码调度策略,较好地均衡了处理器性能需求和数据缓存需求。提出了一种基于静态分时复用调度/动态固定优先级仲裁的混合二级总线仲裁策略,通过分割总线时间片静态调度媒体数据流DMA传输,使之与解码流程同步配合,有效地分配和使用总线带宽,降低了片上数据缓存等硬件开销。

【Abstract】 How to utilize the available hardware resources on the silicon chip effectively is a key issue in high performance/cost media system-on-chip (SoC) development. A bus shared heterogeneous architecture consisting of one or more instruction set processor cores, one or more dedicated hardware IP cores and one or more on-chip memories usually provides a good solution. The research work introduced in this paper mainly concerns the processor core design for media SoC.Media enhancement backward extension to MIPS-I compatible ISA is presented in this paper. Based on the analysis of inherent characteristics of media application algorithms, the basic MIPS-I compatible ISA is extended to support sub-word parallel SIMD operation, special result handling, and dedicated media instructions. The media enhancement extension to MIPS-I compatible ISA is physically realized in the processor core, and improves media processing performance effectively (2-4x) with negligible additional hardware cost (2.7%).A Finite State Machine (FSM) based centralized control scheme is presented in this paper to supervise the CPU pipeline activity. And some effective techniques are discussed to lower the clock period and CPI (Cycles Per Instruction) of the pipeline. To eliminate the clock frequency limitation by some complex instructions’ long executing time and achieve single-cycle throughput, a scalable super-pipelining extension technique together with a high performance/cost pipeline shift mechanism is presented in this paper. For single-issue processor architecture, the radical solution to CPI reduction is to decease pipeline stalls exploiting available software or hardware techniques. A RAW (Read After Write) dependency loop model is developed in this paper to analyze the RAW hazards of register operands in complex pipeline. Based on this model, a "dynamic" data forwarding policy is suggested to reduce the pipeline stalls caused by data RAW hazards. Theoretical analysis and practical experiments both show that the average CPI increment caused by data RAW hazards can be reduced effectively by the dynamic data forwarding strategy.Bus design and media data stream scheduling are key issues in real-time media SoC development. Data scheduling policies for MPEG-1/2 video decoding is discussed in this paper according to software or hardware implementation case. Two 3-frames-grained scheduling policies are suggested to make good trade-off between processing demands and on-chip buffer demands in software decoding implementation. A static time division multiplexed scheduling / dynamic fixed priority arbitration based 2-level hybrid arbitration scheme, incorporated with synchronization control, is introduced in this paper to utilize the bus bandwidth effectively and lower on-chip buffer demands in media SoC.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2004年 03期

