节点文献

可配置可扩展媒体处理器设计

Design of Configurable and Extensible Media Processor

【作者】 刘坤杰

【导师】 严晓浪;

【作者基本信息】 浙江大学 , 电路与系统, 2007, 博士

【摘要】 本文围绕支持多标准视频的处理器体系和SoC设计展开研究,提出了一种可配置可扩展的媒体处理器设计。该方案以嵌入式RISC处理器与视频处理引擎为核心,配以DMA等主要IP模块,通过AMBA总线和专用信道互连,构成紧耦合的异构多核可编程视频处理SoC平台。本文主要在以下三个方面开展了有价值的探索性研究:1)可配置可扩展的嵌入式RISC处理器的设计,特别是内存子系统相关的Cache和MMU设计。在32位嵌入式处理器CK-Core指令体系基础上,设计实现了可配置可扩展的嵌入式RISC处理器CK520。内存子系统的设计中提出了基于组拼合的可在线配置Cache和两级TLB结构的全综合设计MMU。微体系设计实现了参数化,提供了一组包括指令和数据缓存大小、相关性和替换策略,内存管理单元(MMU)的各级TLB表项数,跳转预测等的可配置选项。并可针对特定应用扩展数据通路定制特定的加速指令,或通过协处理器接口扩展可编程加速器。CK520既可以提供RISC处理器的良好编程性,又可以通过配置和扩展获得对不同应用的适应性和高效性。2)面向视频应用的内嵌数据组织的单指令多数据指令体系EDO-SIMD,和基于该指令体系的视频处理引擎设计。通过对视频应用和算法的分析,总结并提取出视频应用的一系列特性和算法核心。通过对算法核心的深入分析发现其中存在子矩阵、线性连续、蝶形交错、广播和时延偏斜等规则模式的数据组织,而传统的SIMD指令体系在上述数据组织中的开销很大,已成为提高视频处理性能的瓶颈,因此设计了一套面向视频处理的EDO-SIMD指令体系。与通用处理器的媒体扩展不同,EDO-SIMD指令体系并非一个基于RISC指令体系的扩展,而是作为一个面向视频应用优化的独立SIMD处理器设计,集成了媒体扩展的优点和许多高效视频处理的新特性。EDO-SIMD指令体系的优点包括:可编程性和灵活性支持不断涌现的视频标准,内嵌数据组织指令通过数据组织与计算融合实现高效能的视频处理,指令集简洁适用于低造价、低功耗的嵌入式系统,可面向应用扩展并根据性能和造价等作配置优化。基于EDO-SIMD指令体系的视频处理引擎设计中,采用参数化和模块化的设计原则实行可扩展的矢量长度,32比特数据ALU/MAC单元作为构成数据通路的1路模块,根据应用对计算性能的需求可选取1、2和4路等实现方案。在操作数读取和片上数据存储回写处,通过矢量置换网络实现内嵌数据组织指令。在ALU和乘法器等的设计中,采用门控位和拆分等策略实现了包括Byte,Half Word和Word在内的各精度数据处理SIMD指令。在片上数据存储的设计中,采用了Byte可寻址的双data-buffer的策略,既可以支持内存的非对齐访问,又可通过DMA并发完成数据的搬运。3)视频处理SoC平台的设计基于AMBA总线和主要处理引擎间专用信道的SoC互连,以嵌入式RISC处理器和基于EDO-SIMD指令体系的视频处理擎为核心,配以DMA和内存控制器等外设,构成了一个基本的异构多核视频处理SoC平台原型。该平台可有效挖掘数据级、指令级和任务级的并行性,提供了较高的视频处理性能。其中RISC处理器可以利用专用信道,通过远程函数调用的模式高效实现对视频处理引擎的任务调度和DMA的配置启动。该平台具有可配置和可扩展的特性,可以根据应用的需求对平台的各部分参数作优化配置,实现高性能低功耗的应用解决方案。应用-算法-体系-VLSI实现相互推动、相互印证的研究思路,和可配置、可扩展、参数化、模块化的设计方法,贯穿于整个研究内容和进程中,对于嵌入式处理器设计和SoC的开发具有一定参考价值。

【Abstract】 Video applications are computationally intensive, stretching the capabilities of current embedded processors. In this dissertation, the architecture design and VLSI implementation of a configurable and extensible media processor is presented, which support multi-standard video applications. In the design, an embedded RISC core and an EDO-SIMD video processing engine were integrated with the DMA etc. via AMBA bus and dedicated communication channel as a high performance and low power heterogeneous multi-core platform. The research has made valuable exploration on following three aspects:First, configurable extensible embedded RISC processor design, especially cache and memory management unit (MMU) in memory subsystem.A RISC processor CK520 is designed based on 32-bit CK-CORE instruction set architecture. A way combined based on-line configurable cache and 2-level TLB MMU micro-architecture is proposed in memory subsystem design. Micro-architecture and memory subsystem of CK520 is parameristic, some parameters such as instruction cache and data cache size, way associativity, replace scheme, the size of MMU TLB, branch prediction etc are configurable. The application specific instruction such as MAC can be implemented via extending data path of basic core, while the programmable accelerator can be attached through coprocessor interface. So the CK520 can provide traditional programmability, as well as adaptability and efficiency via configuration and extension.Second, EDO-SIMD (embedded data organization SIMD) instruction set architecture and video processing engine design for multi-standard video processing acceleration. The features of video application and algorithm were summarized, through analysis on the typical video application benchmarking results. It was found that the operands involved some matrix, sequential, butterfly, broadcast and delay line skew addressing mode. In traditional SIMD media instruction set architecture, these operands organization overhead is obstacle to improve the performance and hardware efficiency. So a SIMD instruction set architecture with embedded data organization (EDO-SIMD) is proposed.EDO-SIMD is not designed as extension to RISC CPU, but designed as a standalone processor architecture optimized for video processing. The features of EDO-SIMD ISA is programmability and flexibility to support multi-standard video codec, embedded data organization instruction and video application specific instructions to boost video processing performance, simplicity for low cost and low power constrained embedded system, scalability and configurability to adapt with application requirements.In the micro-architecture design of video processing engine, a parametric and modular design methodology is applied to support configurable vector length, and a 32-bit data path is desgined as 1-way, and full data path can be tiled to LMAX/4 way according to application performance requirements. Vector permutation networks are inserted into operands reading ports and result store path to support embedded data organization instructions. In ALU and multiplier design, gated bits and split module scheme are use to support various data precision operations including byte, half word and word . And the on chip data memory is byte addressable double data buffer structure, so that the unaligned data load and store can be supported and data can be prepared by DMA engine concurrently with computing.Finally, a video application specific SoC platform design and optimize. In video SoC platform, the embedded RISC processor and EDO-SIMD video processing engine were integrated with the DMA and LCDC etc. IP based on AMBA SoC interconnection and dedicated communication channel. The platform can exploit data level parallelism, instruction level parallelism and task level parallelism. The RISC processor can schedule the tasks on video engine and configure and kick off DMA transaction efficiently in a remote procedure call mode. The platform can be configured and extended to fulfill the application requirements, and get a high performance low power video system solution.The research methodology that application, algorithm, architecture, VLSI implementation are considered seamless, as well as the configurable, extensible, parametric and modular design methodology, are valuable for the embedded processor and SoC design.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2009年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络