

Researches on the Rate Distortion Properties of the Distributed Video Coding

【作者】 王鹏

【导师】 余松煜;

【作者基本信息】 上海交通大学 , 通信与信息系统, 2008, 博士

【摘要】 伴随着网络技术、无线技术以及计算机硬件技术的飞速发展,数字电视、手机电视、网络电视、视频会议等各项多媒体技术在人们的工作和生活中得到越来越广泛的应用。MPEG-1/2/4以及H.261/3/4等传统视频编解码标准普遍采用基于运动补偿/块变换的混合结构,以致编码器与解码器相比,前者往往具有后者5-10倍的复杂度。这在视频广播、视频点播等一次编码、多次解码的多媒体应用中适用且必要。最近涌现出许多具有崭新特点的多媒体应用,如无线视频传感器监控网络、无线PC摄像机、移动摄像手机、一次性摄像机和便携式摄像机等。它们在存储容量、计算能力和功率资源等方面都受到很大的限制,有些由电池供给能量,有些则是即用即抛。因此这些新兴的多媒体技术需要简单的编码器以节省资源。分布式视频编码将耗时的运动估计/补偿从编码端移到解码端,从而得到简单的编码器,因此这使简单的视频编码在技术上成为可能。本文主要针对分布式编码技术手段比较研究、精细化虚拟信道模型、S帧的高效生成算法以及分布式视频编码中不同区域的码率分配等一些关键问题进行研究,得出以下创新之处:伴随式法和奇偶校验法是Slepian-Wolf编码中两种常用的技术手段。从采用线性分组码编解码时纠错能力以及采用LDPC码编解码时译码性能等两方面对它们进行比较,我们得出以下结论:从纠错性能的角度观察,如果二者都采用线性分组码,它们可以等价转换;另外,如果奇偶校验法中校验矩阵具有嵌套特性,则它也可以实现最优Slepian-Wolf编码。给出平方高斯条件下多变量Wyner-Ziv编码的率失真可达区域,并指出该编码范例中,编码端即使不参考边信息也不会损失编码性能,并且很容易将该结论拓展到只有信源和边信息之差为多高斯变量的情况。同时,利用逆注水法对各个变量进行失真分配可以达到该可达区域的边界。这为分布式视频编码中码率分配问题奠定了理论基础。考虑到运动矢量场和邻域像素点平滑性条件限制,本文提出了一种新型、高效的S帧生成方法,该算法将S帧的生成分为三个步骤:首先是进行基于块的运动补偿插值以生成初步的运动矢量场;其次,运动矢量的方向角被均匀量化后,利用平滑滤波器对相邻块运动矢量的量化方向值进行滤波;最后,考虑到邻域像素点的光滑性约束,在像素域进行逐点平滑滤波以进一步去除块错位效应。无论从率失真表现还是主观质量表现角度进行比较,实验结果都显示出该算法明显优于基于运动的外推法MC-E,并在诸如近似平动的运动场景、对话场景以及视频监控场景中,我们提出的S帧生成算法可以和IPME预测算法相比拟,甚至可以得到更优的率失真性能。利用边缘检测算法,将S帧分成不同特性的区域,从而将虚拟信道精细化为多高斯变量Wyner-Ziv编码模型。在不同区域,使用不同的量化、打孔等编码策略。实验证明,区域划分的思想和精细化的理论模型可以带来1.0-2.0dB的率失真增益。同时分布式视频编码器的复杂度略有下降。

【Abstract】 With the rapid development of the network technology, the wireless technology and thecomputer hardware technology, various multimedia technologies are pushed into human’swork life and family life, such as DTV, mobile TV, IPTV, teleconference, etc. Conventionalvideo standards, such as MPEG-x and H.26x, adopt the MC/DCT based hybrid structure.Henceforth, the complexity of the encoder is 5-10 times as that of the decoder, which isappropriate and necessary in multimedia applications which encode once and decode manytimes such as video broadcasting and VOD applications. However, various kinds of new mul-timedia applications emerge, such as wireless VSN, wireless PC cameras, mobile camera-phones, disposable video cameras and camcorders, etc. Some use the battery to provide theenergy, while some are disposable. So they are all constrained largely in the power, memoryand computing capacity. And henceforth the simpler encoder is needed in the aforemen-tioned multimedia scenarios to save resources. The time-consuming ME/MC procedure isshifted from the encoder to the decoder in the DVC, and the simpler encoder is obtained. Thethesis researches on several key problems in the DVC, such as the compare analysis of twoSlepian-Wolf coding techniques, the refinement of the virtual channel, the efficient S framegenerating method and the rate allocation in different regions of the D frame. The noveltiesof the thesis are as followings:The syndrome approach and the parity check approach emerge as two practical Slepian-Wolf coding techniques. In the thesis, performances of these two approaches are comparedfrom two viewpoints. One is the comparison of their error correction capabilities usinglinear block codes; While the other is the comparison of their decodings using LDPC codesspecifically. Moreover, we prove that if the parity check matrix has the nested property, theparity check approach will achieve the optimality.We extend the Wyner-Ziv problem to the coding of multivariate Gaussian source withmultiple Gaussian side information at the decoder. The achievable region is obtained, and itis easily extended to the case that the difference between the source and the side information is multivariate Gaussian, no matter what distributions the source and the side informationare. This introduces the rate allocation problem into the DVC, which can be solved by areverse water-filling method. The multivariate Gaussian Wyner-Ziv coding prototype is thethesis’s important theoretical foundation.A novel and efficient Side-Information Frame Generator (SIFG) is proposed, whichconsiders smoothness constraints of both the motion vector field and spatial adjacent pixels.First, two adjacent decoded Intra frames at the decoder are used to perform the block basedMC-I, so as to obtain the motion vector field of the current S frame. In the second step, thedirection angle of the motion vector is uniformly quantized, and then the smoothing filter isused to smooth quantized direction levels among motion vectors of adjacent blocks. In thefinal step, the pixel-wise smoothing operation is used to mitigate block artifacts furthermore.Simulation results show that the proposed techniques provide potential rate-distortion per-formance advantages to the MC-E method. Besides, the fine visual quality of the S frame isobtained. Especially, the proposed SIFG method can be compared with the IPME algorithm,and even performs better in following video scenarios such as approximately linear motionscenario, dialogue scenario and video monitoring scenario.Adopting the edge detection algorithm, the S frame is divided into regions with differentcharacteristics, and hence the virtual channel is refined into the multivariate Gaussian Wyner-Ziv coding model. In different regions, different quantizing and puncturing strategies areapplied accordingly. Simulation results show that around 1.5-2 dB coding gain is benefitedfrom the refinement of the correlation model. Meanwhile, the simpler encoding character isremained.


