节点文献

3D视频的深度图优化与深度编码方法研究

Research on Depth Map Optimization and Coding for3D Video

【作者】 邓慧萍

【导师】 喻莉;

【作者基本信息】 华中科技大学 , 通信与信息系统, 2013, 博士

【摘要】 3D视频与2D视频的本质区别就在于3D在2D的基础上添加了深度信息,能够产生立体视觉感受,使其在自然场景的表征上更具真实感。深度信息是3D场景捕获中一个非常重要的几何量,它反映的是场景中物体到成像平面的距离。由于深度图与纹理图相比不仅更能节省码流,还能方便灵活的利用DIBR绘制出不同视角的虚拟视点;而且,近几年来,随着深度相机技术的迅速发展,价格适中的深度相机的推出为深度图的获取提供了直接快速的方式。所以,无论是从理论上还是从实际应用上,基于深度的3D视频都是非常有效和可行的方案。本文围绕3D视频中的深度图展开,研究深度图的优化和高效编码方法。本文的第一个部分重点研究高精度深度图的获取,它是后面研究工作开展的前提条件;后面三个部分研究的核心是利用深度图的特性挖掘新的深度图编码方法,在提高深度编码效率的同时保证合成视点的质量。深度相机虽然能方便、快捷地获取场景的深度信息,但是由于技术的限制,目前的深度相机得到的深度图存在分辨率低,有大量空洞等缺陷,无法直接应用到实际系统中。本文首先分析了图像插值、图像去噪等图像处理问题中两类统计建模方法的优缺点,针对深度图像的特性,结合参数化模型和非参数化模型的优点,充分利用图像不同分辨率之间的相似性以及图像内部的相似性特征,提出了一种基于混合参数模型的分级深度图优化方法。该方法在逐级修补深度图空洞的同时保存了深度图边缘特征。深度图的特点是不用来输出显示,而是用来合成一个新的视点,因此深度编码中量化带来的误差会造成合成视点的失真,应该用合成视点的失真来衡量深度编码的失真。本文从这个角度出发,探索一种以合成视点失真最小化为目标的深度编码方法。通过推导深度失真与几何失真的关系,以及几何失真与合成视点失真的关系,建立深度失真与合成视点失真模型,并将此模型应用到深度编码与联合码率分配中去。实验结果表明,与现有方法相比,本文方法能合理分配纹理和深度的编码码率,得到较高的视点合成质量。深度失真会带来合成视点的失真,而且这些失真往往发生在图像边缘。传统的基于MSE的失真衡量方法对图像里的每一个像素同等对待,不能真实反应合成视点的质量,本文将更符合人眼视觉特性的结构相似性度量(Structural Similarityindex,SSIM)引入到深度编码中,进一步深入研究深度编码中的合成视点优化(Synthesized View Optimization, VSO)问题。本文首先建立了深度编码失真与基于SSIM的合成视点失真模型;将此模型应用到深度编码的率失真优化中,建立深度编码码率与合成视点失真之间的率失真模型;估计基于SSIM的感知拉格朗日参数,指导深度编码的最优模式选择。实验结果表明,本文所提出的基于SSIM的合成视点优化在率失真性能和主观质量上都要优于基于军方误差(Mean Sequare Error,MSE)和JM的合成视点优化方法。深度图在大部分区域是平滑的,仅仅在物体边缘位置存在不连续区域,因此深度图比一般的自然图像具有更强的空间相关性。本文针对深度图的这种特征提出了基于空域的深度上下采样编码方法。下采样能大大减少编码端的输入数据量,降低编码码率;但下采样会丢失深度的边缘细节信息,造成合成视点质量的下降。本文利用高分辨率图像与其对应的低分辨率图像之间的统计特征不变性,设计基于协方差估计的深度上采样模型;利用深度图的对应纹理图的边缘相似性设计自适应权重模型,使上采样系数自适应调整以保留深度图各个方向的边缘。本文工作是对基于深度的视频编码的探索和研究,为深度信息的发展和3D视频的应用提供了新的思路和解决方法。

【Abstract】 3D video is an emerging new media for rendering dynamic real-world scenes. Comparedwith traditional2D video,3D video is the natural extension in the spatial-temporal domainas it provides the depth impression of the observed scenery. Besides the3D sensation,3Dvideo also allows for an interactive selection of viewpoint and view direction within thecaptured range. An attractive3D video representation is multi-view video plus depth(MVD) format. With the help of depth maps, many interesting applications such asglasses-free3D video, free-viewpoint television (FTV), and gesture/motion based humancomputer interaction are becoming possible. However, MVD results in a vast amount ofdata to be stored or transmitted, efficient compression techniques for MVD are vital forachieving high3D visual experience with constrained bandwidth. Consequently, efficientdepth coding is one of the key issues in3D video systems.Depth maps are used to synthesize the virtual view at the receiver side, so accurate depthmaps should be estimated in an efficient manner for ensuring a seamless view synthesis.Although the depth cameras provide depth data conveniently, depth maps from thestructured light cameras contains holes owing to their inherent problems. In this paper, wepropose a hybrid multi-scale hole filling method to combine the modeling strength of theparametric filter and nonparametric filter. We progressively recover the missing areas inthe scale space from coarse to fine so that the sharp edges and structure information in thefinest scale can be eventually recovered. For inter scale, this paper presents a novel linearautoregressive-based depth up-sampling algorithm considering the edge similarity betweendepth maps and their corresponding texture images as well as the structural similarityamong depth maps. For intra scale, this paper propose a weighted kernel filter for holefilling based on a weighted cost function determined by the joint multilateral kernel. Thismethod can remove artifacts, smoothing depth maps in homogeneous regions and improving the accuracy near object boundaries.A key observation is that depth map is encoded but not displayed; it is only used tosynthesize intermediate views. The distortion in the depth map will affect indirectly thesynthesized view quality, so the depth map coding aims to reduce a depth bit rate as muchas possible while ensuring the quality of the synthesized view. In this paper, we propose adepth map coding method based on a new distortion measurement by deriving therelationships between distortions in coded depth map and synthesized view. We firstanalyze the relationship between depth map distortion and geometry error by mathematicalderivation, and then set up a model to describe the relationship between geometry error andsynthesized view distortion. Based on the two mathematics relationships, the synthesizedview distortion due to depth map coding is estimated.More specifically, depth coding-induced distortion in synthesized view is always alongboundaries, which is significant for human eyes. However, mean squared error(MSE) andthe like that have been used as quality metrics are poorly correlated with human perception.In this paper, we apply the structural similarity information as the quality metric in depthmap coding. We develop a structural similarity-based synthesized view distortion (SS-SVD)model to capture the effect of depth map distortion on the final quality of the synthesizedviews. The model is applied to the rate distortion optimization which describes therelationship between depth coding bit-rate and synthesized view distortion for depth mapcoding mode selection. Experimental results show that the proposed SS-SVD methodobtains both better rate distortion performance and perceptual quality of synthesized viewsthan JM reference software.Depth maps generally have more spatial redundancy than natural images. This propertycan be exploited to compress a down-sampled depth map at the encoder. In this paper, wepresent an efficient down/up-sampling method to compress the depth map efficiently. Anovel edge-preserving depth up-sampling method is proposed by using both the texture anddepth information. We take into account the edge similarity between depth maps and their corresponding texture images as well as the structural similarity among depth maps tobuild a weight model. Based on the weight model, the optimal minimum mean square error(MMSE) up-sampling coefficients are estimated from the local covariance coefficients ofthe down-sampled depth map. The up-sampling filter is combined with HEVC to increasecoding efficiency. Objective results and subjective evaluation show our proposed methodachieves better quality in synthesized views than existing methods do.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络