节点文献

嵌入式异构多核处理器设计与实现关键技术研究

Research on the Design and Implementation Techniques of Embedded Heterogeneous Multiprocessor

【作者】 岳虹

【导师】 王志英; 戴葵;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2006, 博士

【摘要】 嵌入式应用的发展要求嵌入式微处理器具有高性能、低功耗、结构可扩展、成本低和设计周期短的特征。嵌入式微处理器体系结构及设计方法因此而面临着极大的挑战。在当前集成电路工艺技术条件下,基于面向特定应用的定制处理器设计技术,开展嵌入式异构多核处理器设计与关键技术研究,是该领域的一个重要研究方向,其深入研究具有重要的理论和现实意义。本文在嵌入式异构多核处理器体系结构研究中,结合面向特定应用定制处理器的设计技术,提出了一种以定制处理器核为基础的嵌入式异构多核处理器体系结构,以期在实时性能、设计灵活性以及成本和功耗之间获取最佳的设计折衷。文中以多媒体应用为例,还重点研究了该嵌入式异构多核处理器体系结构设计与实现的核心关键技术,主要包括设计开发环境的构建、应用程序的特征分析、指令集的定制以及定制功能单元的设计等。并在上述研究工作的基础上,具体设计实现了一款面向多媒体应用的高性能嵌入式异构多核处理器芯片,验证了本文的相关研究工作。本文所取得的研究成果主要有:1.提出了一种以定制处理器核为基础的可扩展嵌入式异构多核处理器体系结构。该嵌入式异构多核处理器体系结构融合了高性能通用嵌入式处理器核和多个可面向特定应用进行定制的定制处理器核,基于传输触发体系结构的定制处理器核具有很好的可扩展特点,以及规整性和模块化特点,其硬件可以层次化自动设计实现。2.基于本文所提出的嵌入式异构多核处理器体系结构,提出了其设计实现过程中的体系结构可重定向模拟技术、指令集定制算法及硬件自动生成技术,并在此基础上建立了相应的设计开发环境,有效缩短了设计周期,对相应的嵌入式异构多核处理器的设计、实现、测试和验证提供了有力的支撑。本文使用此设计开发环境,对多媒体应用程序特征及负载进行了量化分析,得到对面向多媒体应用的嵌入式异构多核处理器设计具有指导意义的统计分析结论。3.提出并设计实现了一种基于并行加法器阵列的分散式DCT/IDCT定制功能单元体系结构。该体系结构采用了动态伸缩技术和数据分块技术,将乘法操作转变为查表操作和加法操作,再结合简单的移位操作,完成最终结果的计算。因此只需要很少数量的低位宽加法器、移位器及小规模ROM存储器,既能完成DCT/IDCT变换,并仍能保证计算结果具有很高精度。而且其结构规整,便于硬件高效实现。4.针对多媒体应用计算特点和特殊计算需求,提出并定制了子字并行指令及初等函数计算指令,设计实现了对这些定制指令进行支持的子字并行ALU,多模式子字并行乘法器以及基于CORDIC算法的初等函数计算单元。这些定制功能单元使面向多媒体应用的嵌入式异构多核处理器的实际应用性能得到了大幅度提高,用较小的芯片面积开销获取了较高的应用程序执行性能。5.在上述研究工作的基础上,设计实现了一款面向多媒体应用的嵌入式异构双核处理器EHMP-01芯片。系统研究了该处理器的设计与实现关键技术,包括微体系结构设计、存储系统设计、外围接口设计、逻辑设计和VLSI实现,以及芯片的测试和验证等。该处理器在0.18um工艺下流片,芯片总面积为4.8*4.8mm2,工作主频可以达到300MHz。在300MHz工作主频下,动态功耗仅为670mW。实际运行表明该芯片工作稳定可靠。EHMP-01嵌入式异构双核处理器芯片的成功流片,对本文提出的以定制处理器核为基础的嵌入式异构多核处理器体系结构、设计方法以及一系列关键技术进行了有效的验证。

【Abstract】 The evolution of embedded applications requires advanced embeded microprocessor (EMP) to have the features of high performance, low power, architectural scalability, low design cost and short design cycle (time-to-market). The architecture and design methodology of EMP hence encounters great challenges. Under the current integrated circuit manufacturing process, the research of the design and implementation techniques of embedded heterogeneous multiprocessor based on the design methodology of the Application Specific Instruction-set Processor (ASIP) is an important area of the EMP research. The in-depth study will have great theoretical and practical significance.In this thesis, we applied the design methodology of ASIP to the design of embedded heterogeneous multiprocessor, and proposed a new embedded heterogeneous multi-ASIP processor architecture, to achieve the best tradeoff among real time performance, design flexibility, design cost and energy consumption. Taking the multimedia application as a practical example, many efforts were put on the design and implementation techniques of the new multi-ASIP processor architecture, including the design space exploration, application characteristics analysis, instruction customization and the design of customized function units. Based on these research works, we developed a high performance embedded heterogeneous dual-core processor EHMP-01 for multimedia applications.Primary innovative works of this thesis can be summarized as follows:1. We proposed a scalable embedded heterogeneous multi-ASIP processor architecture. This architecture, which consists of one high performance general purpose embedded processor with multiple ASIPs, can be scaled and customized for different applications. The multiple ASIPs implemented based on transport triggered architecture provide much scalability, and can be automatically generated based on its regular modular design.2. We proposed an automatic implementation methodology for the heterogeneous multi-ASIP processor, and established the design and performance evaluation environment. The environment provides best support to the effective design, implementation, test and verification of the heterogeneous multi-ASIP processor. Based on this environment, we quantitively analyzed the multimedia application characteristics and workloads, to get instructive statistics information for the design of multi-ASIP processor for the multimedia applications.3. We proposed a new distributed DCT/IDCT architecture based on parallel adders. Dynamic ranging and data partition technique are used in the architecture, multiplication operations are transformed to table lookup, add and shift operations. So, only small amount of low cost adders, shifters and ROM memory are needed in the hardware with the insurance of high accuracy. Regular structures also simplified the hardware implementation.4. Aimed at the special computation demand of multimedia application, we proposed a customized design solution for the subword-parallel instructions and elementary function instructions, designed and implemented the corresponding function units, which are customized subword-parallel function units and elementary function units based on CORDIC algorithm. These function units provide high performance speedup ratio with low area cost for the embedded heterogeneous multi-ASIP processor for mutimedia applications.5. We designed and implemented an embedded heterogeneous dual-core SoC chip EHMP-01 based on the above studies. The design of microarchitecture, memory subsystem and peripherals interface were discussed. The logic design and VLSI design, verification and test of the chip were also fully exploited. EHMP-01 was implemented under 0.18um process. The area of the die is about 4.8*4.8mm2 and it can operate at 300MHz with a consumption of 670mW power dissipation in average.Silicon implementation of EHMP-01 processor verified the effectiveness and correctness of the design methodology and a series of key techniques of embedded heterogeneous multi-ASIP processor proposed in this thesis.

节点文献中: