节点文献

多视点视频编码关键技术研究

Research on Key Techniques for Multi-view Video Coding

【作者】 王凤随

【导师】 都思丹;

【作者基本信息】 南京大学 , 电路与系统, 2013, 博士

【摘要】 随着计算机图形学和计算机视觉技术的发展,多视点视频(MVV)越来越多地引起人们的普遍关注。同传统的单视点视频相比,多视点视频拥有丰富的三维深度信息,能够为用户提供无法比拟的立体感和交互性。然而,由于多视点视频是由位置固定的多个摄像机同时从不同角度拍摄同一场景而获得的一组视频信号,其数据量会随着摄像机数目的增多而成倍增加。巨大的视频数据量对存储和传输提出了更高的要求,多视点视频编码(MVC)就是对多视点视频数据的有效压缩。随着新一代显示技术的发展及网络传输能力的快速提高,多视点视频编码越来越受到国内外学者及研究机构的青睐。多视点视频编码沿用传统的混合视频编码框架结构,并对该框架进行了拓展和创新,复杂的预测结构带来了计算复杂度的急剧增加,巨大的计算量严重影响了MVC的实际应用和推广。因此,研究MVC低复杂度快速算法至关重要。在多视点视频中,除了具有同一视点的时间和空间相关性以外,还具有同一时刻不同视点间的视点相关性。因此,如何有效地利用这些视点内及视点间的相关性信息来去除冗余是提高多视点视频编码速度的关键。论文在对多视点视频编码相应关键技术深入分析的基础上,对其中的耗时模块进行了一系列的优化。首先,在对多视点视频编码各个模式分析的基础上,针对MVC模式分析计算量大的缺点,提出了一种有效的Direct模式提前终止模式选择快速算法。基于Direct模式最有可能成为最优模式这一观察,算法首先计算当前宏块的Direct模式的率失真代价(RD cost)值,并与自适应阈值进行比较,以提供一个提前终止的机会。如果当前宏块的RD cost值小于自适应阈值,那么Direct模式将直接被选为最优模式,其余的模式选择过程不必检查;否则,将进行穷尽模式搜索来选择最优模式。自适应阈值的设计是算法实现的关键,该算法综合利用了当前宏块与其相邻宏块的空间、时间及视点间的相关性来共同确定。实验结果表明,同MVC参考软件的穷尽模式选择算法相比,提出快速算法降低了约72.38%的计算复杂度,总比特率平均减少了1.06%,而PSNR仅下降0.05dB。其次,通过对MVC帧间预测可变尺寸块中各个尺寸块分布特点的分析,提出了基于模式复杂度的多视点视频编码帧间预测快速算法。在提出的算法中,根据所定义的模式复杂度将宏块分成3个不同的模式类型,每种类型仅检查相对应的模式分块,其余不必要的模式分块就可以提前终止,从而使得计算量大大减少。实验结果表明,同全模式选择算法相比,提出算法在保持编码效率基本不变的同时,计算复杂度减少62.75%。再次,针对多视点视频编码视间预测效率低的问题,提出了视差估计提前终止的视间预测快速算法。提出的方法是基于帧间各分块模式之间的预测方向的相关性而提出来的,采用帧间16×16模式在视点方向的预测结果来确定其他模式是否进行视差估计。实验结果表明,提出算法能够有效地跳过不必要的视差估计过程,从而有效地降低多视点视频编码视间预测的计算复杂度。最后,基于上述算法,本文提出一种融合算法。该算法融合了Direct模式提前终止算法、可变尺寸块帧间预测算法和视差估计提前跳过算法。实验结果表明,该融合算法能够最大限度地降低多视点视频编码的计算复杂度,平均可降低78.79%,同时比特率可以降低0.07%,而PSNR值仅仅降低了0.04dB。综上所述,本文分析了多视点视频编码的各关键技术,并对相应的模块进行了优化研究。所提出的快速优化算法能够很好地降低多视点视频编码的计算复杂度,对多视点视频编码的应用具有重要的参考价值。

【Abstract】 With the development of compute graphics and computer vision technology, multi-view video attracts more and more attention. Compared with the traditional single-view video, multi-view video comprises rich three-dimensional depth information, which can provide people with the highly-welcome experience of3D stereoscopic and interactive. However, multi-view video is captured by a set of video cameras from various viewpoints but at the same time. With the increasing number of cameras, the amount of video data is linearly increased. Huge amount of video data highly requires for efficient storage and transmission. Multi-view video coding is efficient compression for multi-view video data. With the advances in the new display and network transmission techniques, multi-view video coding attracts more and more attention.MVC follows the classic block-based hybrid video coding framework, and the development and innovation of the framework. Intricate prediction structure brings out rapid increase in computational complexity, which obstructs MVC from practical application and promotion. Therefore, it is very essential for MVC to study low complexity fast algorithms. In MVV, it is also with inter-view correlation between different views but at the same time instant, besides spatial correlation and temporal correlation within a single view. Hence, the key of speeding up encoding for MVC is how to effectively utilize these correlations within a sigle view and between views to remove the redundancy. This research paper dedicates much effort to series of optimizations for those time-consuming modules of MVC, based on the analysis for the key techniques of MVC.First, an efficient early Direct mode decision for MVC is proposed in order to overcome heavy computation of mode analysis, on the basis of analyzing each mode of MVC. Based on the observation that the Direct mode is highly possible to be the optimal mode, the proposed method first computes the rate distortion cost of the Direct mode of the current macroblock and compares this RD cost value with an adaptive threshold for providing an early termination chance as follows. If this RD cost value is smaller than the adaptive threshold, the Direct mode will be selectd as the optimal mode and the checking process of remaining modes will be skipped; otherwise, exhaustive mode decision is used to check all the modes to select the optimal mode. The key of the proposed algorithm is the design of the adaptive threshold, which is determined by using the spatial, temporal and inter-view correlations between the current macroblock and its neighboring macroblocks, respectively. Experimental results have shown that the proposed method is able to reduce the computational load by72.38%and the total bit rate by1.06%, while only incurring a negligible loss of PSNR (about0.05dB on average), compared with exhaustive mode decision in the reference software of MVC.Second, a fast inter prediction algorithm based on mode complexity for multi-view video coding is proposed, after analyzing the characteristic of each variable block size in inter prediction of the MVC. In the proposed algorithm, macroblocks are divided into three different mode classes on the basis of the mode complexity defined. Each class only checks the specified mode size(s), and the other unnecessary mode sizes can be early terminated. Thus, computational load can be greatly reduced. Experimental results have demonstrated that the proposed method is able to reduce62.75%with negligible loss of coding efficiency, compared with the full mode decision in the reference software of MVC.Third, a fast inter-view prediction algorithm based on an early disparity estimation skipping is presented aim at impoving the prediction efficiency between inter-views for MVC. This method is proposed via using prediction direction correlation between inter mode sizes. The prediction result of mode16×16selecting inter-view prediction as its optimal prediction can be used to decide whether disparity estimation of the other mode sizes is selected or not. Experimental results have shown that the proposed method can omit the unnecessary disparity estimation process, and effectively reduce the computational complexity in inter-view prediction for MVC.Finally, a fusion algorithm is proposed based on the above-mentioned algorithms. This algorithm combines the Direct mode early termination, variable size inter prediction and early disparity estimation skipping. Experimental results have shown that the fusion algorithm is able to significantly reduce the computational complexity of MVC by78.79%on average and the total bit rate by0.07%on average, while only incurring a negligible loss of PSNR (about0.04dB on average), compared with exhaustive mode decision in the reference software of MVC.In summary, the key techniques in multi-view video coding are analyzed in this paper, and the optimizations of the corresponding modules are studied. The proposed fast optimization algorithms can significantly reduce the computational complexity of MVC,which has an important reference value to the practical applications of MVC

  • 【网络出版投稿人】 南京大学
  • 【网络出版年期】2014年 05期
节点文献中: