节点文献

基于率失真模型的H.264码率控制技术研究

Rate Distortion Model Based Rate Control Technique Research for H.264

【作者】 崔子冠

【导师】 朱秀昌;

【作者基本信息】 南京邮电大学 , 信号与信息处理, 2012, 博士

【摘要】 随着视频编码与网络技术的发展,视频通信在人们日常生活与工作中发挥着越来越重要的作用。H.264作为目前主流的视频编码标准,采用了许多新的技术获得了比以往标准更高的压缩效率与网络适应性,适合各种类型的视频通信。由于视频信源与传输信道的多样性,码率控制成为任何实际视频编码与传输系统不可缺少的关键步骤,用于在目标码率及缓冲区等约束条件下通过调整编码参数(主要是量化参数)来规范输出码流,使之适合信道传输的特性并最优化视频的感知质量。许多新的编码技术尤其是率失真优化技术的采用使得H.264的码率控制更加困难,针对不同环境下H.264率失真优化和码率控制技术的研究具有重大的理论意义和实际应用价值。本文根据H.264视频编码技术的特点,针对目前H.264码率控制技术的若干关键问题展开研究。主要内容包括以下几个方面:(1)针对H.264帧内编码码率控制效果不佳的问题,提出了一种新颖的图像复杂度自适应Ⅰ帧码率控制算法。本文首先用Sobel算子检测Ⅰ帧亮度像素的梯度,建立4×4块的边缘方向直方图,得到每个4×4块最可能的帧内预测模式和相应重构块,最终获得与实际编码相近的残差图像。用残差的平均绝对值表达Ⅰ帧编码复杂度,之后提出了一种经验型Ⅰ帧码率-量化模型,同时考虑缓冲区状态和当前序列的编码特性为Ⅰ帧分配合适的目标比特,最后为每一个图像组得到了合适的Ⅰ帧量化参数。(2)针对H.264经典的RC提案JVT-G012的不足,根据编码单元之间的空时相关性提出了一种低复杂度的P帧宏块层码率控制改进机制。首先为了减少宏块层平均绝对值(MAD)预测的计算复杂度和不准确性,根据运动相似性来估计当前宏块的运动矢量,并直接使用当前宏块与估计运动矢量所指向参考块之间的残差来计算MAD值;由于H.264中宏块头码率的预测对于率失真模型和量化参数计算影响重大,本文根据宏块间的空时相关性来更准确地预测宏块头码率;最后根据宏块的编码复杂度来分配目标码率,而且宏块层二次率失真模型参数更新时使用的历史数据点也是根据空时相关性进行选择,而不是简单地使用最近已编码的历史数据点。(3)针对H.264低码率应用中常发生被动跳帧而导致解码端质量波动的问题,提出了一种结合自适应跳帧的码率控制策略以保留带有重要信息的帧而主动跳过不太重要的帧。为了得到更加符合人眼视觉特性的编码序列,跳帧准则是根据原始帧与通过运动矢量拷贝帧内插机制得到的重建帧之间的主观质量属性(结构相似性测量)并结合当前缓冲区状态来判断。在编码端由跳过帧节省下来的码率分配给要编码的关键帧以增强编码帧的空间质量,在解码端根据改进的运动矢量拷贝帧内插机制从相邻编码帧来恢复跳过帧以维持常帧率来获得平滑的视频质量。(4)针对传统码率控制算法大多以客观失真作为失真度量,无法得到最优的主观质量的问题。将基于结构相似(Structural Similarity, SSⅠM)的主观失真用于H.264视频编码的率失真优化和码率控制,提出了一种基于SSⅠM的P帧宏块层码率控制算法。首先根据大量实验和理论分析提出了一种经验型的SSⅠM线性失真模型,并结合改进的二次码率-量化模型用Lagrange乘子法得到了基于SSⅠM的P帧宏块层最优量化步长的闭式解。最后给出了完整的基于SSⅠM的P帧宏块层码率控制算法。(5)将基于SSⅠM的主观失真用于指导H.264视频编码中基于率失真优化的帧内与帧间模式选择。研究内容分为两个部分,第一部分根据提出的Ⅰ帧码率量化模型和Ⅰ帧SSⅠM线性失真模型将SSⅠM失真用于指导Ⅰ帧编码的帧内宏块模式选择,进一步提出了一种帧层内容自适应的Lagrange乘子来更好地平衡码率和SSⅠM失真。第二部分在P帧基于SSⅠM的宏块层码率控制的基础上将SSⅠM失真用于指导H.264的P帧编码中基于率失真优化的帧间宏块模式选择,进一步提出了宏块层自适应的分析型Lagrange乘子来更好地平衡码率和SSⅠM失真。实验表明,与基于客观质量的码率控制算法相比,基于SSⅠM的码率控制和率失真优化模式选择算法更好地编码了图像结构信息,使得图像主观质量显著提高,更加符合人的主观感受;且计算复杂度较小,可用于实际编码环境。

【Abstract】 With the development of video coding and network technology, video communication plays more and more important role in people’s daily life and work. As the most popular video coding standard, H.264 adopts many new techniques and acquires higher compression efficiency and network adaptability compared with prior coding standards, and adapts to any kinds of video communication. Due to the diversity of video source and transmission channel, rate control (RC) becomes key step and is indispensable for any actual video coding and transmission system. Rate control is used to regulate output bitstream to meet the characteristics of channel transmission and to optimize the perceptual video quality by adjusting coding parameters under constrained conditions such as target bit rate and buffer fullness. The adoption of many new coding techniques especially rate distortion optimization scheme makes the rate control for H.264 more difficult. Research on rate distortion optimization techniques and rate control schemes for H.264 under various applications has significant theoretical meanings and actual application values. Based on the technique features of H.264 video coding, this work focuses on the rate control techniques for H.264. The main research work are as follows:(1) To address the bad effect of intra coding RC of H.264, a novel image complexity adaptive I frame RC algorithm is proposed. This work first detects the gradient of luma pixel in I frame by Sobel operator and establishes edge direction histogram for each 4×4 block, hereby gets the most probable intra prediction mode and corresponding reconstructed block, finally obtains the residual picture which is close to the actual coding residual. The mean absolute value (MAD) of residual is used to represent I frame coding complexity, then an empirical rate quantization model is proposed, and the optimal QP of I frame is determined accurately for each GOP according to allocated target bits by simultaneously considering buffer status and sequence characteristic.(2) Aiming at the shortage of H.264 classic RC proposal JVT-G012, an improved MB layer RC scheme is proposed based on the spatial-temporal correlation among basic units. First, to reduce computation cost and inaccuracy of linear MAD prediction at MB layer, MAD is computed directly according to the difference between current MB and the reference blocks pointed by estimated MV using intensive motion similarity. Then, MB header bits are predicted based on spatial-temporal correlation because MB header bit prediction has great effect on rate distortion model and QP computation. Finally, MB target bit rate is allocated according to its complexity and the parameters of quadratic R-D model are updated using coded MBs with high spatial-temporal correlation not the last coded data points.(3) To address the problem of passive frame skip occurs frequently and decoder side quality fluctuates in H.264 low bit rate applications, a RC scheme combined with adaptive frame skip is proposed to encode important frames and skip trivial frames. To get subjective friendly video sequences, the frame skip rule is based on subjective metric (structural similarity) between original frame and reconstructed frame and buffer status. The saved bits from skipped frames are allocated to key frames to enhance their coding quality, and the skipped frames are recovered from key frames to get constant frame rate and smoothed video quality at decoder side.(4) Conventional RC schemes take mostly objective metric as distortion measure, which can not acquire optimal subjective quality. This work applies structural similarity (SSIM) based subjective distortion to rate distortion optimization and RC in H.264 video coding, and proposes a SSIM optimal MB layer RC algorithm. First, an empirical SSIM linear distortion model is put forward. Then an improved quadratic rate quantization model is combined to obtain the close-form solution of SSIM optimal MB layer quantization step by Lagrange multiplier.(5) Subjective distortion (SSIM) is used to direct RDO based intra and inter MB mode decision in H.264 video coding. The research work includes two parts. The first part uses SSIM distortion to direct I frame intra MB mode decision based on the proposed I frame R-Q model and SSIM linear distortion model, and further proposes a frame layer adaptive Lagrange multiplier (λ) to balance rate and SSIM distortion better. The second part uses SSIM distortion to guide P frame inter MB mode decision on the basis of P frame SSIM optimal MB layer RC, and further proposes MB layer analyticλto trade off rate and SSIM distortion. Experiments show that SSIM optimal RC and RDO mode decision encodes image structural information better and gets higher subjective quality compared with objective quality based RC, and has low complexity and thus can be used in actual video coding applications.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络