节点文献

基于纹理及JND建模的视频编解码研究

Research on Texture and JND Modeling Based Video Coding

【作者】 陈皓

【导师】 胡瑞敏;

【作者基本信息】 武汉大学 , 通信与信息系统, 2011, 博士

【摘要】 当前典型的通用视频编码标准采用基于预测、变换架构的混合编码技术框架,其基本原理建立在香农信息论基础之上,压缩思想仍停留在数字信号处理的层面,主要从去除数据冗余角度入手,编码效率的提高主要依赖于以运算复杂度大幅增加为代价的技术细节的微调。随着视频编码技术的发展,目前基于香农信息论的视频编码技术面临着较大的发展瓶颈,以提高计算复杂度来提升压缩效率的改进思路的发展空间越来越小。如何进一步高效率的提升压缩效率成为视频编码领域一个亟待解决的问题。视觉感知编码从图像内容的角度出发、基于人眼视觉理论来指导视频压缩,在保持图像主观感知效果基本不变的条件下能极大降低码率,对于解决这一难题具有重要意义。本文首先对视觉感知编码中的典型代表技术纹理合成、JND建模技术进行综述分析,并以其为核心具体从视频编码的时域预测技术、“隐含运动估计”的帧间预测技术、基于JND模型的编码技术三个方面进行综述并得出结论:由于人类对视觉感知机理认识的不足,完全从人眼视觉系统HVS出发探索新编码途径的工作仍然进展缓慢,存在不少不足之处。但是基于局部视觉特征、将某些较为成熟的视觉合成技术无缝引入到传统视频编码框架内,不仅在一定程度避免了单纯基于视觉特征编码技术面临的难点,还能突破现有视频标准依靠提升计算复杂度来改进压缩效率的传统思路。基于以上分析,本文在国家自然科学基金青年基金项目“基于纹理建模的预测编码技术研究”(No.61003184)、国家自然科学基金面上项目“基于反向纹理合成的视频编码技术研究”(No.60970160)和微软亚洲研究院创新计划项目“基于Contourlet变换和图像结构信息的JND模型”(No. FY09-RES-OPP-013)的资助下,对视觉编码领域中的纹理建模和JND建模技术展开研究,提出了兼容传统混合视频编码框架的纹理合成、JND建模编码方案,形成了基于视觉特性的改进编码算法,提高了视频编码效率,具有较高的理论价值。对于满足高清视频应用中提升压缩效率的需求和宽带移动环境下提升视频容错能力的需求而言,具有重要的应用价值。具体来讲,本论文的主要研究成果如下:(1)基于动态纹理模型的视频编/解码算法传统动态纹理模型求解方法采用前面若干帧图像的平均值作为合成图像的基准值,使得合成的虚拟帧表现的是一段时间内图像的整体运动趋势。该方法合成的图像虽然主观效果较好,但是由于合成图像与当前待预测帧的相关性较低,降低了帧间预测效率。虽然有学者提出了改进的求解方法,但是为了保证编/解码端的数据匹配而省略噪声项,使得动态纹理模型没有了噪声驱动项,理论上导致模型无法驱动。针对这一问题,本文提出一种改进的动态纹理模型求解方法,通过引入伪随机函数作为模型驱动项,采用逐帧更新迭代的方法使得合成的虚拟帧具有更小的图像合成误差值。在此基础上,在编码端提出一种虚拟帧算法,改善了现有多参考帧预测技术对于非线性运动、背景光照变化时预测效率不高的影响;在解码端提出一种帧级错误掩盖方法,改善了整帧丢失情况下传统错误掩盖算法对于复杂运动场景恢复效果不佳的影响。(2)基于STALL模型的帧间预测算法原始STALL纹理模型以像素点为基本处理单元、形成逐点合成的处理框架,而现有视频编码标准采用基于块的处理框架(如H.264标准以4×4块为最小处理单元)。将STALL模型用在基于块为最小处理单元的视频标准有损压缩时,由于空域邻居点无法实时获取到,只能利用时域邻居点信息来建模,降低了模型预测精度。针对这一问题,本文提出以4×4块为处理单元的时、空邻居点自适应选择方法,建立适合视频有损压缩的改进STALL模型,提出了一种新的帧间预测模式,提高了帧间预测的预测精度。(3)基于彩色JND模型的残差自适应滤波算法传统彩色JND模型的建模方法通常基于RGB、YCbCr等色彩空间,由于这些色彩空间不是均匀色差系统,不具有进行彩色图像分析与处理所需的独立性和均匀性指标,使得在计算色度JND阈值时的精度有待提高。针对这一问题,本文基于均匀色差空间CIELAB彩色系统,提出对应的彩色JND模型建模方法,使得该JND模型在色度分量上能更准确的表征人眼感知特性,具有更好的感知峰值信噪比,进而将其应用到视频编码中的自适应残差系数滤波模块中,在主观质量基本相同的条件下进一步提高了编码的压缩效率。综上所述,本文基于纹理及JND建模理论建立一套高效视频编码框架,突破传统视频编码技术单纯以提升计算复杂度来提高压缩效率的局限,提出了一套兼容传统混合视频编码框架的增强编码工具集,具有较为重要的理论意义和应用价值。在此基础上,本文最后总结了相关研究成果的创新之处,并基于多视点视频编码、视频质量评估这两个方面进行了下一步的研究展望,期望结合已取得的研究成果在基于纹理合成的多视点视频编码、基于视觉特性的视频质量评估这两个方面进行进一步的探索。

【Abstract】 The current typical video coding standards are based on prediction/transform hybrid coding framework, which was formulated based on the Shannon-Fano information theory. The most widely used compression principle is still at the level of digital signal processing, mainly from the removal of data redundancy. The compression efficiency mainly relies on the computing complexity increase at the expense of the technical details of fine-tuning. Currently, the video coding technology based on Shannon-Fano information theory is facing a break bottleneck. The way of increasing computational complexity to improve the compression efficiency is becoming invalid. How to improve the compression efficiency is an important problem. Perceptual video coding, which is based on human visual system theory, can greatly reduce bit rate while maintaining the same effect of subjective perception. It is of great significance for the solution to this problem.This dissertation first analyzes the typical representative of perceptual video coding: texture synthesis and JND model, concentrates on analyzing the research status of video temporal prediction, implicit motion prediction and JND based video coding technologies, and draws a conclusion:Because the research on the mechanism of human understanding should be further studied, the progress of video coding techniques entirely from the human visual system is still slow, and there are many deficiencies need to be solved. However, based on the local visual characteristics, a few researchers introduce some perceptual coding tools into the traditional video coding framework. This method can break through the traditional compression ideas, which rely on increasing the computational complexity to improve the coding efficiency.Based on the above analysis, this dissertation researches on video coding based on the texture and JND modeling, which is supported by the National Natural Science Foundation of China "A research of video coding based on texture model" (No.61003184), the National Natural Science Foundation of China "A research of video coding based on inverse texture synthesis" (No.60970160) and the Microsoft Research Asia Project based Funding "Improved JND model based on contourlet transform and image structural information" (No. FY09-RES-OPP-013). Based on texture and JND models, this dissertation establishes a video coding framework which is compatible for the traditional hybrid video coding framework. By using the perceptual coding techniques as a tool set, it breaks through the bottleneck in traditional video coding techniques and improves the efficiency of video coding, which has a strong theoretical value. And the work is very significant for the high definition video applications on demand to enhance compression efficiency and the wireless broadband mobile applications to enhance the fault tolerance performance.The major contributions of this dissertation are as follows:(1) The video encoding and decoding algorithm based on dynamic texture modelThe traditional method for solving the dynamic texture model uses the average of the previous frames as the reference value of synthetic image, making the synthesis frame to be characterized by a period of time trend of overall image. This method is not suitable for the inter prediction of video coding. Some researchers have proposed an improved method and omitted the noise item in the original dynamic texture model to ensure the encoder and decoder data matching. However, in the signal processing and system theory, the image sequences can be thought of as the consequence of a bivariate stochastic process driven by the noise item. If the noise item is omitted, the dynamic texture model would not be driven in theory. To solve this problem, this dissertation gives an improved solution for the dynamic texture model. By using the pseudo-random number to describe the noise item, this method would make the synthetic frame has a smaller synthesis error. Based on the improved method, this dissertation presents a new algorithm for dynamic texture extrapolation using for H.264 encoding and decoding system. The synthesized frames can be used by the encoder for virtual reference frames choice in inter prediction which might improve the inter prediction on the sequence with non-linear motion and global illumination change between frames. And the synthesized frames can be also integrated into the decoder for whole frames loss error concealment, which achieves significant improvement over the traditional motion vector extrapolation method.(2) The inter prediction algorithm based on STALL modelThe original Spatio-Temporal Adaptive Localized Learning model is implemented in a pixel-wise fashion. For each pixel, the scheme identifies its spatial neighbors as well as its temporal neighbors within a causal window. But for the lossy compression, for example, H.264 standard adopts 4X4 block transform structure and the reconstructed pixels are not identical to the original pixels, the spatial neighbors in the 4×4 block could not be accessed when the 4×4 block hasn’t been encoded yet. To solve this problem, this dissertation designs an improved STALL model by proposing an adaptive spatial and temporal neighbors’ selection strategy, and adds an LSP inter prediction mode into H.264 standard for lossy compression which improves the accuracy of inter prediction.(3) The adaptive residue filter algorithm based on color JND modelThe previous color JND models are usually based on RGB or YCbCr color spaces. Because these color spaces are not uniform color system, the perceived color change produced by a fixed small change of the color coordinates is non-uniform. It would be a problem when computing the chrominance JND component and thus the precision of chroma JND threshold value should be improved. To solve this problem, this dissertation introduces a new color JND model based on the CIELAB color space, which the value of chroma JND component holds with good precision. And then, an adaptive residue filtering algorithm would be proposed, which can increase the compression efficiency of H.264 standard while having the same subjective perceptual quality.In conclusion, this dissertation researches on the video coding based on texture synthesis and JND model, establishes a compatible hybrid video coding framework and proposes a series of video coding algorithms, which could improve the video compression efficiency. It is of significance for research and development of video coding and communication systems. Lastly, the dissertation summarizes the research achievements and looks into the future research in the area of multi-view video coding and video quality assessment. Based on the achieved research achievements, we expect to do further exploration on the texture synthesis based multi-view video coding and perceptual video quality assessment.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2012年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络