节点文献

基于TMS320DM642的H.264编码器的实现与优化

【作者】 黄勇坚

【导师】 王洪君;

【作者基本信息】 山东大学 , 信号与信息处理, 2006, 硕士

【摘要】 作为新一代的多媒体应用视频编码标准,H.264/AVC采用了许多不同于以往标准的先进技术,在编码效率和性能大幅提高的同时,增强了错误恢复及网络自适应等功能,在广播电视、视频存储与回放、视频会议等领域具有广泛的应用前景。但H.264编码性能的提高是以其计算复杂度的明显增加为代价的。如何在硬件资源有限的嵌入式环境下开发出具有实时编码功能的视频编码器是一项极具挑战性的工作。 TMS320DM642是美国德州仪器公司开发的第二代高性能超长指令字结构的定点DSP处理器,具有8个独立的功能单元和64个32位通用寄存器,在8个功能单元里扩展了专门用于视频/图像处理的指令集,提高了视频处理的性能和指令结构的并行性;在600MHZ的时钟频率下,DM642的峰值处理速度达到4800MIPS(每秒百万条指令);DM642片内采用两级存储器结构,并具有丰富的片上外围接口,如10/100Mbps以太网接口、三个可配置的视频端口、一个64位的外部存储器接口等。DM642的强大处理和接口能力使它非常适合基于IP和无线网络的音视频传输、安全监控等视频/图像处理领域的应用。 本论文主要介绍如何在TMS320DM642硬件开发平台上进行H.264“baseline”编码器的开发与优化。编码器源程序采用三大开源代码之一的x264的编码部分。与官方提供的JM系列测试源码相比,x264编码器摒弃了一些对编码性能贡献微小但计算复杂度极高的新特性,更易于移植和优化。视频编码算法在DSP芯片上的高效实现,必须充分挖掘视频处理器的并行特性和计算资源,才能满足系统实时性的要求。我们在原x264编码器程序基础上主要做了以下几项工作:一是对程序进行裁减、修改并最终移植到DSP平台上运行;二是充分利用DM642的EDMA控制器等对数据传输和存储空间进行优化:三是利用内联函数、线性汇编等对H.264核心算法和程序进行改进,提高代码运行的并行性。最后提出了一个复杂度较低、编码效率较高的嵌入式实时H.264编码器方案。 目前,我们的H.264编码器每秒钟能够完成28~38帧QCIF格式图像的编码。解码后的视频图像具有较高的主观质量和客观质量。

【Abstract】 As a video coding standard for next-generation multimedia, H.264/AVC adopts a number of advanced technologies different from the previous standards. In addition to improved coding efficiency and coding performance, other capabilities of the new standard are also enhanced, including error resilience and flexibility for effective use over a broad variety of network types. H.264/AVC provides a technical solution for a broad range of applications, including broadcast television, video storage and playback, videoconferencing, etc. But improved coding efficiency comes at the cost of higher computational complexity. It is full of challenge to develop an embedded real-time video encoder with the limited on-chip memory space.The TMS320DM642 device is a fixed-point digital signal processor (DSPs) based on the second-generation high-performance very-long-instruction-word (VLIW) architecture VelociTI.2TM developed by Texas Instruments (TI), which has eight highly independent functional units and 64 32-bit general-purpose registers. The VelociTI.2TM extensions in the eight functional units of DM64x include new instructions to accelerate the performance in video and imaging applications and extend the parallelism of the VelociTI.2TM architecture. At a clock rate of 600MHZ, the DM642 device can perform up to 4800 million instructions per second (MIPS). The DM642 uses a two-level internal memory architecture for program and data and has a powerful and diverse set of peripherals. The peripheral set includes: 10/100 Mbps Ethernet MAC (EMAC); three configurable video ports; a 64-bit external memory interface (EMIFA), etc. The powerful capability of data processing and interface make DM642 very fit for the video and imaging applications, for example, the audio/video transmission and security monitor over IP (Internet Protocol) and wireless networks.The main task of this paper is to introduce how to develop and optimize the H.264 "baseline" encoder on the hardware platform based on TMS320DM642. The source program adopted is the encoder part of the "x264" which is one of the open H.264 codec software. Compared with the official JM software, x264, which gets rid of some new characteristics which have little contribution to coding performance and high computational complexity, is easy to be ported and optimized. The effective method to

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2006年 12期
  • 【分类号】TN762
  • 【被引频次】2
  • 【下载频次】372
节点文献中: 

本文链接的文献网络图示:

本文的引文网络