节点文献

基于视觉感知的H.264感兴趣区域编码研究

Research on H.264 Region-of-Interest Coding Based on Visual Perception

【作者】 郑雅羽

【导师】 陈耀武;

【作者基本信息】 浙江大学 , 电子信息技术及仪器, 2008, 博士

【摘要】 视频编码技术是有效传输和存储视频信息的关键技术之一,是现代信息技术中不可或缺的重要组成部分。H.264/AVC(以下简称H.264)是ITU和ISO/IEC联合制定的最新视频编码标准。从视频编码技术的发展历程来看,如何在复杂度和时延受限的条件下,获得最优化的率失真性能,是视频编码设计的核心问题。研究人员先前主要从减少空间域冗余、时间域冗余和统计冗余三个方面来改善视频编码的率失真性能,而目前采用视觉处理、基于区域的视频编码技术是该领域的热点研究方向之一。视觉神经科学研究已经证明,人类视觉系统(Human VisualSystem,HVS)对视频场景的感知是有选择性的,不同的区域或者对象具有不同的视觉重要性。然而,传统的视频编码算法,在压缩视频图像时,并没有考虑HVS对视频场景感知的多样性。因此,对如何利用视觉感知原理来改善H.264视频编码算法的编码效果和计算效率这个问题进行深入研究,具有重要的理论意义和应用价值。本文正是在这种研究背景下,展开了基于视觉感知的H.264感兴趣区域编码算法的研究。第1章绪论部分首先阐述了选题的意义,然后对国内外研究现状进行了综述并作了相应的总结,最后介绍了本课题的主要研究内容和论文结构。第2章针对全局运动估计计算复杂度过高的问题,提出了一种基于运动矢量对消和差分原理的快速全局运动估计算法。该算法分为两个步骤,首先基于不同象限运动矢量对之间存在的对称抵消特性,估计出平移运动参数分量,然后使用运动矢量对的差分原理,并且结合一种置信判断的策略,估计出变换运动参数分量。全局运动参数的快速有效估计,为后续三章的研究工作奠定了基础。第3章提出了一种基于H.264编码域的移动区域检测算法,以运动矢量和像素差值的绝对值的和(Sum of Absolute Difference,SAD)等H.264编码辅助信息作为输入特征量,通过三个算法步骤实现对移动区域的检测。首先,通过全局运动估计及补偿处理和空间域-时间域两步运动矢量滤波方法,实现对运动显著区域的快速检测;然后通过对零运动矢量处的SAD建立χ~2分布,采用基于F假设检验的变化检测方法,来快速检测包含小幅运动的移动区域;最后利用上述两步的检测结果计算出最终的移动区域分布图。移动区域的快速有效检测,为下一章运动感知子模型的研究奠定了基础。第4章提出了一种新颖的视觉感知模型,采用时间域和空间域的特征融合方式,计算视频场景的视觉感知图,有效模拟出HVS对视频场景的感知结果。该视觉感知模型由运动感知子模型、纹理感知子模型和空间位置感知子模型三部分构成。首先基于运动速度、运动方向、运动一致性和生物运动等视觉特征,对HVS的运动感知进行了建模,有效模拟HVS对移动区域的感知;接着基于HVS的视觉敏感度和视觉掩盖效应感知机制,对HVS的纹理感知进行了建模,有效模拟HVS对纹理复杂度的感知;然后基于HVS的中央凹和眼动控制感知机制,对HVS的空间位置感知进行了建模,实现了全局运动类型自适应的空间位置感知权重调整。第5章提出了一种基于视觉感知的H.264感兴趣区域编码算法,以视觉感知模型和H.264感兴趣区域编码器之间的信息共享为基础,首先采用已提出的视觉感知模型计算视觉感知图,然后进行基于视觉感知图的比特资源分配和计算资源分配,实现了H.264编码效果的改善,及计算效率的提高。在比特资源分配算法中,首先根据HVS对高频信号失真不敏感的感知机制,研究并提出了一种自适应频率系数压制技术;然后分别从理论以及实验两方面分析了视频编码中比特资源的分布特性;最后基于视觉感知图和一种有效的整体编码控制策略,实现了编码效果的改善。在计算资源分配算法中,在对H.264最优编码模式与视频场景内容特征的内在关联进行实验分析的基础上,根据视觉感知图和全局运动类型,研究并提出了一种高效的H.264快速模式分析算法,实现了计算效率的提高。第6章总结了本论文的研究成果和创新点,并提出了进一步研究的方向和任务。

【Abstract】 The video coding technology, one of the key technologies in the effective transmission and the storage of the video information, takes an important part in the modern information technology. H.264/AVC (H.264 for short) is the newest video coding standard jointly recommended by ITU and ISO/IEC. In the developing history of the video coding technology, how to achieve the optimal rate-distortion performance under the constraints of the complexity and the allowed delay remains the core problem of the video coding design. In the past, the rate-distortion performance of the video coding was mainly improved by the reduction of the spatial, temporal and statistic redundancies, while nowadays the region-based video coding technology using the visual processing becomes a major research direction in the video coding domain. The perception of HVS (Human Visual System, HVS) for the video scene is selective, and different regions or objects in the video scene have diverse levels of visual importance. However, the conventional video encoding algorithm ignores this diversity of perception mechanism. Therefore, it is of theoretical meaning and practical value to take an in-depth study on the improvement of the compression and computation efficiencies of H.264 encoding algorithm by applying the principle of the visual perception of HVS.In chapter 1, the significance of my research work is presented together with a brief summary of the present research status.Chapter 2 proposes a fast GME method based on the principle of the symmetry elimination and difference of motion vectors to reduce the computational complexity of global motion estimation (GME). The proposed method consists of two stages. Firstly, the translational parameters are achieved by using the technique of the symmetry elimination of motion vectors. And then the transform parameters are estimated by the principle of the difference of motion vectors and the strategy of the belief judgment. As a result, the effective and efficient estimation of global motion parameters lays a foundation for the following research.In chapter 3, a novel moving region detection method in H.264 compressing domain is presented, in which the side encoding information, including motion vectors (MV) and sum of absolute differences (SAD), are applied as the input features. The proposed detection method is composed of three processing steps. In the first step the global motion estimation/compensation processing and the spatio-temporal filter method for MV are used to detect the moving regions with the salient motion. Then, the x~2 distribution about the SAD information at zero MV is to be constructed. Next, a change detection algorithm derived from the F hypothesis test is applied to detect the moving regions including the salient and non-salient motions. Finally, the detected results of the two steps described above are adopted to compute the final moving region map.In chapter 4, a novel visual perception model, composed of motion perception, texture perception and spatial position perception sub-models, is proposed by fusing the spatio-temporal visual features. First of all, in order to simulate HVS’s perception for moving regions, the motion perception of HVS is modeled by fusing the motion visual features including motion velocity, motion direction, motion coherence and biological motion. Then, the texture perception of HVS is modeled based on the perception mechanism of the visual sensitivity and the visual masking effect in HVS to simulate HVS’s perception for texture complexity. Finally, the spatial position perception of HVS is modeled on the basis of the perception mechanism of the fovea and the eye movement in HVS. Therefore, the spatial position perception sub-model can adaptively adjust the perceptual importance of different positions in video scene according to the global motion type.Chapter 5 brings forward a novel H.264 region-of-interest coding method based on the visual perception to allocate the bit and computation resources. By the proposed visual perception model the visual perception map (VPM) can be computed. Firstly, In order to allocate the bit resource effectively through the VPM, an adaptive frequency coefficient suppression technique is derived from the principle that HVS is less sensitive to the distortion of high frequency signals. Secondly, the distribution characteristic of the bit resource is theoretically and experimentally analyzed. Finally, the optimal bit resource allocation is achieved according to a novel encoding strategy. In order to allocate the computation resource effectively based on the VPM and the global motion type of the video scene, the relation between the optimal encoding mode and features of the contents of the video scene is experimentally analyzed, during which a fast and effective H.264 mode analysis algorithm is deduced.The final chapter concludes the new achievements of the whole research and the prospect of the future research.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2009年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络