节点文献

应用于手持设备的H.264硬件解码IP核的研究与实现

Research and Implementation of H.264 Hardware Decoder IP Core

【作者】 卜帆

【导师】 顾美康;

【作者基本信息】 上海师范大学 , 通信与信息系统, 2009, 硕士

【摘要】 H.264是ITU-T与ISO/IEC联合开发了新的数字视频压缩标准,也是目前最先进的压缩标准。H.264的应用范围非常广泛,可满足于不同网络环境和应用场合,如标清和高清电视服务、手机和数码相机等消费电子、多媒体网络视频会议等。在消费领域中,多媒体业务已广泛运用。对视频序列进行H.264实时软解码,使得CPU频率必须运行在300Mhz~400Mhz,导致功耗增加。随着大规模集成电路设计的发展,由于集成电路芯片具有面积小,性能高,功耗低的特点,因此在消费领域,在芯片平台上实现H.264视频解码具有广阔的应用前景和实际意义。本课题的目标是设计符合H.264标准,支持图像大小为CIF,baseline(基本档次)/level 3,解码速率为30fps,应用于手持设备的视频硬件解码IP核。本文在概述了H.264解码系统中各个技术环节之后,对H.264硬件解码系统架构做了模块分割,给出了Ping-pong缓存器、码流解析模块中的总控状态机和图像重建信息解析的具体实现细节,并且设计了H.264解码并行计算的时序策略和解码IP核的整体系统硬件架构。由于在解码过程中,图像重建模块包含了大量的计算,对IP核的面积、性能、以及功耗影响最大,因此本文对此模块做了深入研究与精心设计。主要针对反量化反变换计算、帧间预测计算以及帧内预测计算,基于面积成本、性能以及功耗三方面的权衡,提出了三个算法模块的三种硬件实现架构。在反量化与反变化硬件模块中,详细分析了DCT反变换矩阵计算,提出了利用存储矩阵,将一维DCT反变换与二维DCT反变换复用计算资源。对于反量化计算,将尺度因子形成的大小为4×4的6个查找表,根据位置的合并,缩小成每个大小为2×2的查找表,降低查找表空间。在性能和功耗的平衡上,提出了利用多个门控时钟形成的计算流水线,在提高计算性能的基础上,降低系统动态功耗的时序结构。在帧间预测硬件模块中,由于计算过程比较复杂,本文提出了由内插控制模块选择数据的输出以及亮度6抽头计算结果的锁存,其他计算模块在控制信号的作用下流水计算的架构。这样可降低整体计算复杂度,并且利用亮度6抽头计算参考数据个数与色度内插计算参考数据个数相同的特点,复用数据线,节省系统带宽资源。由于加入了内插控制模块,计算数据流具有了规律,本文又提出在线性计算中插入5×4的存储体矩阵替代标准算法中需要大量数据锁存而引起的片内存储器数量巨增。根据5类内插的不同过程,逐行或逐列地将线性计算中的第一个加法因子存储,在控制信号的作用下,直接与计算得到的第二个加法因子线性计算,得到最后的帧间预测值。在帧内预测硬件模块中,分析帧内预测的17种预测模式。其中,非Plane预测模式拥有5种计算形式,为了消除算法中的大量计算冗余,本文将这5种计算形式合并,提出一个可涵盖5种不同形式的计算模式,利于硬件实现的重构。在Plane预测模式计算中,本文给出了基于硬件实现的优化方案。在每个4×4块预测计算之前,计算得到一个基准值,在水平和垂直方向的索引下求和得到4×4块中所有的预测值。这样就避免了原算法中大量的乘法运算,压缩了硬件面积。最后,本文给出了基于FPGA平台的视频硬件解码IP核占用逻辑资源大小,并且针对多个300帧4:2:0的标准视频序列进行测试。结果表明,在时钟频率为10Mhz的情况下,对图像大小为CIF的视频,可达到30fps的实时解码。

【Abstract】 H.264 is a new generation video encoding standard constituted by ISO/IEC and ITU-T. H.264 is the most advanced video encoding standard currently. H.264 has a wide application and can be satisfied with different network environment. For instance, standard and high definition TV、ceil phone and digital camera、IP visible telephone and so on. At present, the multimedia service has a wide application in consumption field. Video sequcences can be decoded by software in real time when CPU must run at 300Mhz~400Mhz. That will result in the power increase highly. With the fast development of large scale integrated circuit, The integrated circuit chip has less dimension but high stability and low cost. We can conclude that developing H.264 decoder by integrated circuit technology has a significant application prospect in consumption field.The main object in this paper is to design the H.264 hardware decoder IP core which is applied to mobile equipments in consumption field. After analyzing the H.264 decoding algorithms in detail, this paper divides the hardware architecture into three module according to the different function, and points out detail hardware implemetations of Ping-pong buffer、FSM and image information analysis in Bitstream controller module. After that, this paper designs the strategy of parallel computing and the whole architecture of H.264 hardware decoder IP core. Because image reconstruction module includes large of calculating which has important impact on area、performance and power consumption of ASIC. This paper take a careful research and design on this module. Three hardware architecture are pointed out for IQIT、inter prediction and intra prediction.In IQIT module, after analyzing algorithm of IDCT, this paper points out a hardware architecture to multiple a computing module between 1 dimension IDCT and 2 dimension IDCT. In invert quantification calculating, this paper change the size of rescale look up tables according different pixel locations. The purpose is to decrease the resource of look up table on chip.For the balance of performance and power consumption, a calculating pipline which is generated by gate clocks is presented. This timing structure can increase computing ability and decrease power consumption.In inter prediction module, this paper adds an interpolation control module to select computing data output and register the results of computing because of the complicated algorithm. Other calculating modules work under control signals. This architecture can simplify the process of inter predict computing. The number of reference pixels for 6 coeffs filter and chroma interpolation are the same. For this reason, it multiple data lines to decrease the resource of system band width. According to the data stream in interpolation control module, a 5×4 storing matrix is substituted for large of filp-flops in the algorithm of H.264 standard. The first addition factors in linear computing are stored in 5×4 storing matrix according different kinds of interpolating locations. After the second addition factors are getted, they can be added by the data in 5×4 storing matrix directly.In intra prediction module, non-plane mode prediction is adopted to wipe off space redundancies of current picture and improve the coding efficiency.According to the characteristics of intra prediction, a reconfigurable hardware decoding architecture of which combine the same operation in different prediction modes is proposed. In plane mode prediction, an optimized method which is based on implementation of hardware is proposed. Every intra predicted value can be obtained by a foundation value according horizon and vertical index. This architecture compresses the hardware implementation area and also improves the module utilization efficiency.On FPGA platform, The IP core has passed the RTL level and gate level simulation. It meets the quality and speed requirements on basis of baseline profile of H.264 with 30 fps and the resolution ratio of 352×288 when the frequency is 10MHz

节点文献中: 

本文链接的文献网络图示:

本文的引文网络