节点文献

MPEG-2到H.264/AVC视频转码及相关技术研究

Researches on MPEG-2 to H.264/AVC Video Transcoding and Relative Techniques

【作者】 郑艳

【导师】 郑世宝;

【作者基本信息】 上海交通大学 , 信号与信息处理, 2008, 博士

【摘要】 MPEG-2和H.264/AVC是视频编解码技术发展过程中最具有代表性的两个视频编码标准。现有的大量MPEG-2视频信息在H.264解码端的再现,以及两种标准在现在及将来的一段时间内的共存,使得MPEG-2到H.264的转码成为近年来视频信息处理领域的热点。广播电视运营商希望在有限的带宽上传递尽可能多的节目,城市图像监控需要将不同标准编码器得到的视频数据信息进行交互,流媒体对视频信息实时性具有较高要求——这些实际应用领域均对MPEG-2/H.264转码从不同角度提出了多样化的技术要求。由于H.264的JM模型在对编码模式进行决策时,为了得到最好的RD性能,遍历了所有可能的编码模式——JM模型为了达到最高的压缩效率,在计算方面不计代价。因此,MPEG-2/H.264转码的关键问题在于如何利用MPEG-2解码过程中提取的码流信息为H.264简化编码提供依据,从而减小H.264重编码过程的复杂度。对于MPEG-2/H.264的转码而言,尤其在流媒体和实时监控等实际应用需求下,若重编码仍采用JM模型中的模式决策方案,其实时性会比较差。考虑到MPEG-2在编码时已经进行过一次模式决策和运动估计,这就使得在MPEG-2完全/部分解码过程中提取其运动信息为重编码的模式决策服务以降低算法复杂度成为了可能,这也是MPEG-2/H.264转码遵循的一个基本原则。本文的主要研究内容有:(1)MPEG-2/H.264帧内转码方法。本文提出一种基于MPEG-2 DCT系数反映出来的边缘信息来进行帧内编码的(预测)模式决策算法。此算法利用MPEG-2解码的DCT系数矩阵的直流系数决策H.264重编码的编码模式,利用MPEG-2DCT系数矩阵的交流系数一次性决策帧内编码的预测模式,避开了H.264的JM模型参考软件的高复杂度。本文首先从参考算法的理论基础进行分析,进而得到改进方案,在改进方案的基础上重新进行预测模式决策,在没有增加任何算法复杂度的基础上,得到了更好的转码RD性能。(2)本文进行了MPEG-2/H.264帧间转码的研究。目前MPEG-2/H.264帧间转码算法多是基于MPEG-2的运动向量和编码块模式的分析,很少重视MPEG-2预测残差和H.264重编码的模式之间的关系。本文从MPEG-2残差的DCT系数着手,分析MPEG-2残差和H.264预测模式之间的关系,并在此基础上考察MPEG-2的运动向量/预测运动向量与H.264预测模式的运动搜索中心/搜索范围之间的关系,提出一种全新的快速帧间转码算法。在此算法中,实验表明,本文提出的MPEG-2/H.264帧间转码算法与级联帧间转码算法相比较,在PSNR几乎没有下降的情况下,以平均码率增加5.7%的代价,换来转码复杂度2/3的降低。最重要的是,本文提出的帧间转码算法不需要任何预设门限值,也没有任何需要用后验方法得到的参数,因此非常有利于实时转码的实现。(3)考虑到MPEG-2/H.264转码后的视频信号通常还伴随着帧率变换的实际需求,本论文还对帧率转码作了研究。本文提出了一种基于分段约束的动态帧丢弃的帧率转码算法。分段约束保证了转码后码流的逻辑性。动态帧丢弃满足了人们对局部运动明显的图像所带来的信息的视觉需求。用当前帧和前一帧的帧活动性来预测后一帧的帧活动性大小,以决定后一帧是否保留下来进行重编码,提前决定帧的去留,有利于实时转码的实现。同时,针对其中可能涉及到的多帧丢弃问题,提出一种新的覆盖块选择方案,以使运动向量重建过程中的误差尽可能小。帧丢弃率越高,这种新的覆盖块选择方案的优势越明显。另外,本文还提出了针对此帧率转码算法的实现架构,为硬件实现提供了思路。(4)从整个转码系统的角度看,H.264的解码是整个转码系统的重要技术环节,因此,本论文还对H.264的熵解码的实现作了深入研究。在分析了H.264熵编码的原理和对象特点的基础上,本文提出一种并行的双控制器熵解码器结构。此解码器的重要特点在于采用了主从两个控制器,可以部分实现并行解码,以得到尽可能高的解码效率。实验表明,相比普通的基于码字解码的熵解码器,本文提出的基于语法元素的解码器在解码I帧、P帧和B帧时能够分别节省27.9%、18.2%和48.8%的时间。

【Abstract】 MEPG-2 and H.264/AVC are the most important video encoding standards. The fact that the streams encoded with MPEG-2 standard need to be decoded on the H.264-based terminals and the coexistence of these two standards makes the transcoding from MPEG-2 to H.264 the hot topic in the video transcoding field. Different kinds of applications have different requirements on MPEG-2/H.264 transcoding. For example, the service providers of the TV broadcasting would like to transmit more programs on limited width channels, and the stream multimedia screams for real-time multimedia signals. Since traditional H.264 encoding process intends to obtain the best RD performance in spite of complexity, and MPEG-2 motion estimation results are good indications for H.264 motion estimation, the key problem of MPEG-2/H.264 transcoding is how to use the information obtained in MPEG-2 decoding process to simplify the H.264 encoding.The main contributions of this thesis are as follows:(1)An improved MPEG-2/H.264 intra transcoding algorithm is proposed firstly. The reference algorithm of the proposed is one of the simplest algorithms among MPEG-2/H.264 intra transcoding solutions. It determines the re-encoding mode by calculating the variance of MPEG-2 DC coefficients and determines the H.264 prediction mode by using the relationship of MPEG-2 horizontal and vertical AC coefficients. The reference algorithm determines both the H.264 mode and prediction mode at one time, which avoid the high complexity of traditional H.264 encoding process. However, the theoretic foundation of the reference algorithm has little flaw, which causes the incomplete coverage in prediction mode determination. The proposed algorithm re-analyzes and improves the theoretic foundation of the reference algorithm. The experimental results show that the improved solution obtains a better RD performance without adding any complexity.(2) A fast MPEG-2/H.264 inter transcoding algorithm is also proposed. In this thesis, the relationship of MPEG-2 residual DCT coefficients and the H.264 encoding modes is analyzed. Besides, the relationship of MPEG-2 MV/PMV and H.264 motion estimation center/range is analyzed. The proposed algorithm reduces 2/3 complexity and the PSNR almost keeps the same, with the cost of bitrate increase of 5.7%. The most important is that the proposed algorithm needs no preset or posterior thresholds or parameters, which are very preferable to real-time transcoding applications.(3) Considering the fact that MPEG-2/H.264 transcoding is always companied with frame skipping, the research on frame-skipping transcoding is also studied. In this thesis, a frame-skipping transcoding scheme based on periodical constraint and dynamic frame-skipping decision is proposed. Periodical constraint makes the transcoded stream more logical, and the dynamic skipped frame selection re-encodes the frames with more activity information, which is always the most expected information for people. In this proposed algorithm, dropping one frame or not is decided by the frame activity. The frame activity is predicted according to the activities of the previous two frames ahead of time, which is very useful for real-time transcoding. In the proposed frame-skipping transcoding scheme, multi-frame skipping may happen. A new overlapping block location algorithm is also proposed for MV reconstruction in multi-frame skipping process to reduce the drift error occurred in the traditional solution. The efficiency of the proposed algorithm is better than that of the traditional one, and its advantages are more obvious as the number of skipping frames is increased. According to the proposed frame-skipping transcoding scheme, the corresponding architecture is also proposed for hardware design.(4) The decoding of H.264 is one of the most important stages of the MPEG-2/H.264 transcoding system, which plays an important role for real-time transcoding application. In this thesis, a parallel decoding architecture with double controllers is proposed to meet the above requirement. To improve the time and storage efficiency in this entropy decoder, the main controller, the sub controller, the storage of context parameters and the central decoder are optimized according to the characters of the decoding objects, i.e. the syntax elements (SEs). Experimental results show that the proposed architecture saves 27.9%, 18.2% and 48.8% of the decoding time compared with the reference decoder in I frame, P frame and B frame, respectively.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络