节点文献

信源解码系统设计与多视点视频编码方法研究

Design of Source Decoder and Research of Multi-view Video Coding

【作者】 朱政

【导师】 张明;

【作者基本信息】 浙江大学 , 通信与信息系统, 2010, 博士

【摘要】 近二十年来,数字化的浪潮席卷全球,数字多媒体技术的发展和应用极大地影响和改变着当今世界。在其中,数字电视的迅速推广和普及尤为引人注目。在其取代模拟电视的转型过程中,一个成熟而庞大的产业链正逐步形成,有着巨大的市场空间。与此同时,新型的三维立体视频也已初展魅力,引起了人们的关注和热忱。本文一方面立足于数字电视产业化实践,对信源解码系统中的关键部分包括传输流解复用和视频解码系统进行了研究与设计;另一方面面向未来三维和可交互视频应用,对多视点视频编码预测结构的性能建模与优化提出了一系列创新算法。本文主要工作和研究成果如下:1.提出高性能的双路传输流解复用设计方案和结构。采用面向对象的通道配置方法,以及分拍分时和分段分时机制,支持双路传输流和多路音视频同时解析,最大限度地复用了系统资源。2.提出一种多标准视频解码体系结构。支持的标准包括MPEG-2,MPEG-4, H.264/AVC,AVS以及RealVideo 8/9/10。权衡实时性与灵活性,进行了清晰的软硬件划分和流水线设计。针对视频解码系统中处理速度的瓶颈,对其关键模块包括AVS算术熵解码器和多标准运动补偿子系统进行了优化设计,提出了高速的实现结构。3.以有向树分析为工具,对分层B预测结构的建模分析与性能优化提出了一种全新的视角和方案。并提出一种基于参考间隔的线性模型,用于B图和分层B结构的压缩性能量化评估。该模型复杂度低,在具体应用中可进一步省略参数,具有简单实用的价值。将压缩性能评估模型应用于有向树,提出使用动态规划来构造最优树的方法。最优树可调节压缩性能与随机访问性能的平衡。通过最优压缩树构造的分层B结构配置简单,且与现有结构相比,在各个图像组长度下具有更好压缩性能。4.提出一种多视点视频视点间预测结构的最优化配置方法。首先,将预测结构的配置问题转换为最优编码顺序的安排。而后在模型中引入模拟退火算法实施快速搜索。实验表明模拟退火算法在本问题中能够迅速收敛至全局最优。该方法能够充分发掘视点间相关性,与多视点视频编码参考结构相比,可获得0.1-0.8dB的增益,且适用于任意相机阵列。

【Abstract】 With the development of technologies, digital media has influenced and changed modern society over the last 2 decades significantly. The digital television is a notable representative of the various applications of digital media. A corresponding industrial chain has become large-scale with the popularization of digital television. Just in the transition period of analog video to digital and high-definition formats, new types of media such as 3D video have attracted people’s attention by their subtle charms.This dissertation is focused on two issues. First, taking the digital television source decoder as the research subject, it presents the design and implementation approach of the key parts including transport demultiplexor and video decoder. Second, considering the application of 3D and interactive video in the future, this dissertation proposes several efficient algorithms on the performance modeling and optimization in multi-view video coding (MVC).The main work and contributions are as follows:A high-performance design and implementation methodology of twin transport demultiplexor is presented, which is able to demultiplex two transport streams and multiple audio/video elementary streams synchronously. The key parts of the architecture are shared by the two TS inputs in terms of every-cycle time slice and every-section time slice. This design is sufficient to support more tremendous functions with little additional resource.A novel video decoding architecture is proposed, which can support multiple standards including MPEG-2,MPEG-4,H.264/AVC, AVS and RealVideo 8/9/10.An elaborate software and hardware cooperation scheme is adoted for the tradeoff of flexibility and real-time performance. High speed achitectures are designed for the most computationally expensive parts in video decoder, including AVS context-based binary arithmetic decoder and motion compensation sub-system for multiple standards.On top of this, research on MVC is also explored in this dissertation. The proposed directed tree decomposition offers a new perspective to analyze and arrange hierarchical B prediction structures. Besides, a linear model based on reference intervals is proposed for the quantitative evaluation of B pictures and hierarchical B prediction structures. To evaluate the relative efficiency of prediction structures, the model can be further simplified by omitting parameters. In combination with the proposed linear model for compression efficiency, it is straightforward to evaluate the performance of any hierarchical B prediction structure from the directed tree.With a dynamic programming method, the optimal tree is set up elegantly, which can be used for tradeoff optimization of compression efficiency and random access ability. Experiment results show that the optimal tree of compression efficiency achieves higher performance than the existing hierarchical B prediction structure.Furthermore, this dissertation proposes a novel method to optimize the inter-view prediction structures for MVC.The construction of prediction structure is converted to the arrangement of coding order. Then simulated annealing (SA) algorithm is employed to minimize the total cost in order to obtain the best coding order. Experiment results show that the annealing process converges rapidly to satisfactory states in this problem. SA algorithm exploits the inter-view correlation to the best of its potential and the generated optimal prediction structures achieve 0.1-0.8 dB higher PSNR performance than the reference prediction structure of MVC.The proposed method is applicable to arbitrary camera arrangements.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络