节点文献

四维矩阵视频编码及音视频同步技术研究

Research on Four-Dimension Video Coding and Synchronization of Audio and Video

【作者】 齐丽凤

【导师】 陈贺新;

【作者基本信息】 吉林大学 , 通信与信息系统, 2007, 博士

【摘要】 随着计算机网络技术和数字通信技术的迅速发展,多媒体应用已经深入到人们生活的各个领域。越来越多的数据量使媒体数据的存储和传输都成为严重问题。另一方面,多媒体系统结合了多种媒体类型,各个类型之间都存在时间约束关系,必须在数据处理中维持各个媒体对象之间的时间关系,才能保证用户不会遗漏和误解多媒体数据所要表达的信息内容。这使得视频编码和音视频的同步技术已成为多媒体技术的焦点。H.264标准是ITU-T和ISO/IEC联合制定的最新编码标准,继承了H.263和MPEG1/2/4视频标准协议的优点,在各个主要的功能模块内部使用了一些先进的技术,提高了编码效率。基于H.264,提出了四维矩阵同步编码模型。包括四维矩阵预测模式编码,子阵划分,四维DCT,量化,音视频同步控制方法,DCT系数的重排序,熵编码。熵编码中,提出了四维矩阵上下文的变长编码方法,以全面去除彩色视频各象素之间、各彩色分量之间以及连续帧之间的相关性,从而实现高信噪比条件下的高倍压缩。音视频同步控制方法中,提出了嵌入式音视频同步编码传输算法。将音频压缩码流作为隐藏信号嵌入到视频图像的DCT系数中,然后进行视频的压缩编码。与常用的时间戳的同步模型相比,嵌入式算法没用使用系统时钟,在对视频图像的质量影响较小,节省了传输资源的情况下实现了音视频的同步编码传输。

【Abstract】 With the rapid development of multimedia remote sensing, images processing and application technology, video images and audio data are more and more. Many applications of digital video, such as video conferencing, video on demand, distance learning, remote medical treatment, need transmit a large of video images and audio data. To store the video and audio data need enormous storage capacity. If no compression code, there will much difficulty in their store and transmission. Therefore, video compression code technology is one of key issues of related fields.On the other hand, difference from tradition media, multimedia system combines different type media. Among the different type media, exist time relation, to ensure user not miss and misunderstand the media information, must keep time relation of every media object, That is to say, must control the synchronization of different media. Thereout, video compression coding and synchronization technology of audio and video have become the key technologies of multimedia applications. The synchronized technique of video and audio solves synchronized sampling, compression, synchronized transfer, information reception and synchronized play back mainly.From the first generation to the second generation of coding technique, video coding develops rapidly. Some new video coding techniques and new standards have been proposed in recent years, the most famous are H.26x and MPEG series. Most of these standards are based on the method of inter-frame motion compensation and two-dimensional discrete cosine transform (2D-DCT) and encode and describe the color video in YCbCr format, which want to take advantage of human visual system (HVS) to save bit expense by decreasing the resolution of two color difference components. Then, and each channel is compressed independently. In fact, the color components (R, G, B) may be strongly correlated. Even after the transformation, brightness and chroma might have correlation. As well known, the three frames of a color image are unified reflection of the same physical model. They have the same texture, edge and gradient of varying gray-level. Each frame can reflect almost all information of an image except color. Human vision characteristics express that the relation of the severalty components is nonlinear. It is clear that if we compress Y, U, V data separately, the inherent correlation between color components can not be utilized efficiently, limiting both compress ratio and PSNR performance. Four- dimension matrix is adopted to represent color video. Thus, color video can be represented in a unified mathematic model. And 4DM-DCT is used to wipe off correlation between neighboring pixels in the same image and adjacent spatial frames.Based on H.264 video coding standard, a four-dimension audio and video synchronization coding model is set up first. It includes the following parts: four-dimension sub matrix motion compensation predictive coding, four- dimension DCT, sub-matrix quantization, audio and video synchronized control algorithm , DCT coefficients reset and entropy coding.In the entropy coding process, four- dimension matrix contex variable length coding is presented. After four- dimension zigzag scan, DCT coefficients are in a descend order. During inverse order coding process, based on the previous coded coefficient, choose an appropriate code table for the current coefficient. The coding method includes two parts: describe coding and coefficients coding. Coefficients coding includes±1 coefficients coding, no-zero and no±1 coefficients coding, and 0 value among no-zero coefficients coding.The optimization of parameters which affect performance of the encoder and decoder are studied by experiments. Finally, comparisons and analysis are made between the proposed method in the thesis and other video coding methods include 2D-DCT/motion compensation, vector quantization (VQ) and Huffman coding. Experiment results show that, the PSNR and the compression ratio (CR) of the proposed algorithm is much better than the traditional method, but lower than VQ, For the relative still video, the CR of the context-based 4D matrix video coding is lower than the Huffman coding under the same parameters and the same PSNR, but for the relative complex moving video, the result is reverse. Experimental results prove that the proposed method has better compression effect for video coding based on 4D matrix.On audio and video synchronized control, embedding synchronized coding for audio and video scheme is proposed. Audio compress bit stream is hiding data, embedded into video images DCT medium frequency coefficients. Then the hybrid signal is encoded and transmitted. In the decoder, audio bits are extracted from DCT coefficients, and audio and video are reconstructed respectively to playback. Three embedding coding schemes are introduced: 1. Relation of two DCT coefficients embedding scheme Two mid-frequency quantization coefficients in a sub-block denoted as BQ ( x1 y1z1t1) and BQ ( x2 y2z2t2) are chosen. Adjust the relation of two coefficients to embed audio bits. If audio bit is 0, modify BQ ( x1 y1z1t1), let its absolute value bigger or equal to the absolute value of BQ ( x2 y2z2t2), otherwise, modify BQ ( x2 y2z2t2), let its absolute value bigger the absolute value of BQ ( x1 y1z1t1).experiments results show that, synchronization of audio and video can be achieved perfect, but the embedding expending of some video sequences exceed 3% that of MPEG-2.2. Fixed point DCT coefficients embedding scheme Choose a fixed point DCT coefficient, If audio bit is 1, modify the coefficient, let absolute value bigger than 1 after quantization process, otherwise, let the coefficient absolute value is less than 1 after quantization process. Then, audio bits embedded. Experiments results show that, PSNR value larger than scheme 1 and embedding expending is less than 3%.3. DCT coefficient parity embedding scheme Based on the parity of fixed point DCT coefficient, audio bits are embedded into video images. Both single channel and stereo audio data are embedded in this scheme. Experiments result show that, the PSNR of reconstructed images which embedded into single channel audio only descend about 0.2dB compared with which embedded into stereo audio, the scheme is the most perfect one of the three schemes.In embedding synchronization coding schemes, all the three can extract exact audio bits and achieve audio and video synchronization transmission with very little degradation of reconstructed image quality and save transmission channel.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2008年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络