节点文献
基于采样模型的多维矢量矩阵DCT整数变换编解码器研究
Research on Multi-dimensional Vector Matrix DCT Integer Transform Codec Based on Sampling Model
【作者】 刘丽丽;
【导师】 陈贺新;
【作者基本信息】 吉林大学 , 通信与信息系统, 2011, 博士
【摘要】 近年来,随着计算机技术和网络技术的发展,多媒体的应用已深入到我们生活的各个方面,为用户提供多媒体服务已经成为通信领域和计算机领域必然的技术发展趋势。与其它传统数据应用相比,多媒体应用最突出的特点是集成了音频、视频等多种类型的媒体流,而数字化后的音、视频流具有庞大的信息量,这给视频信息的存储和传输带来严峻的挑战。虽然当前计算机硬件水平发展很快,但仍无法满足对视频信息的处理要求。即使计算机运算速度和存储空间在某些情况下能够满足处理数字化视频信息的要求,如果直接对未压缩的原始数据进行处理,数据中的冗余信息对计算机硬件和网络带宽都会造成浪费。所以必须对数字视频信息进行压缩以提高效率,节省资源。因此,视频压缩编码技术一直是相关领域的永久性研究热点。彩色视频是由时间上连续的彩色图像序列组成,这些图像序列在空间域的行列之间、颜色域的色彩分量之间、时间域的前后时间点之间存在着很强的相关性,这些相关性表示彩色视频的内部存在着大量的冗余信息,需要运用数学方法去除冗余实现视频信号的压缩编码。多维矢量矩阵理论利用彩色视频图像在时域、空域及颜色分量上的相关性,如同魔方游戏运算规则一样把对彩色视频多个帧的处理统一起来,转化为对空间多维变换的数学处理,全面去除彩色视频中的相关性,从而实现高信噪比条件下的高倍压缩。以陈贺新教授为核心的课题组近年来一直致力于多维变换模型的研究,提出了多维矩阵理论,并对其不断地丰富和发展,在图像压缩和视频压缩领域都取得了显著的成就。课题组先后提出了三维矩阵变换压缩编码和三维矩阵宽离散余弦变换,并将其应用于彩色图像的压缩编码。由于前期定义的多维矩阵理论在乘法的定义方面涉及到多种不同类型的乘法规则,不利于实际应用,本研究小组近年创新的提出了多维矢量矩阵理论,解决了不同维数不同阶数之间的乘法问题,更加丰富了多维矩阵理论。本文在这个理论基础上深入研究多维矢量矩阵理论在视频压缩方面的应用。本文以国家自然科学基金项目“彩色视频流的多维矢量矩阵正交变换编解码器研究”和国家自然科学基金国际合作项目“基于音频嵌入视频方式同步视频编码的普适计算”为依托,分析了目前流行的图像、视频压缩编码技术和多维信息处理技术在图像和视频处理领域的应用,并对多维矢量矩阵编解码器中的多维矢量矩阵整数变换,多维标量量化和扫描,多维模型的建立等问题进行了系统的、较为全面的研究,为进一步构建更加有效的编解码器奠定基础。离散余弦变换(DCT)的变换性能在所有次优变换中最优,能极大地去除图像元素在变换域中的相关性,在图像和视频编码领域得到了广泛的应用。但由于DCT的变换矩阵是用浮点数表示,运算量大,占用系统资源多,同时易产生数据漂移。整数变换能够解决数据漂移和编码效率低的问题。整数变换是用整数矩阵代替DCT的浮点数变换矩阵,这样变换过程完全是整数运算,保证了编码的可逆性;同时整数乘法可用加减法和位移代替,因此变换过程可以完全通过加减法和位移实现,运算量大幅度减少。本文利用H.264中二维整数变换的实现方法,结合四维矩阵离散余弦变换的特点,提出了四维矩阵离散余弦变换从整数到整数的可逆变换方法,并证明了四维矢量矩阵整数核算子的正交性和能量集中性。最后将整数核算子用于彩色视频流的压缩,通过具体的例子证明了该方法的可行性,得到了很好的结果。结果表明在压缩比相同的情况下解码视频的信噪比本文的算法优于国际标准H.264/AVC,和浮点数变换的结果相当。本文在前期对多维矢量矩阵正交变换编码进行了深入的研究,得到了性能优良的正交变换矢量整数矩阵,用于视频编码后取得了良好的压缩效果。但是由于变换编码后系数的处理一直采用的是基于矢量量化的方法,对码书的依赖性比较大,不利于视频编码的广泛应用和跟国际标准的兼容,为此本文统计了多维矢量矩阵DCT整数变换(MD-VMICT)后直流系数和交流系数的分布特点,根据统计特性提出了适用于MD-VMICT编解码器的多维量化方法和扫描方法。本文提出了一种指数量化方式,为了能够运用移位操作代替除法,修正量化公式为2的指数函数。扫描方法根据统计特性修改zigzag的扫描方式。最后通过实验确定量化器的参数设置,并与参考文献和H.264/AVC, MPEG4,进行了比较。实验结果表明本文方法优于参考文献和MPEG4,和最新的国际标准性能相当,并在低比特率条件下表现出优势,同时本文方法有较高的适应性,能够与现行的标准兼容。多维矢量矩阵理论要求对视频数据进行多维矢量划分,如何有效的表达多维矢量,找到各个分量之间的相关性是后续正交整数变换的基础。我们前期已经建立了三维模型,利用各个颜色分量的行、列、时间进行三维建模,并进行合理的分块划分,分别进行多维矩阵乘法的正交变换,取得了比较好的压缩效果,在此基础上继续研究四维建模方法,把一帧的视频数据按特定模型扫描的方式进行采样,则可构成空间的行、列、场和时间的帧四维超立方体模型,更有效的去除时间、空间相关性。本文提出了堆积模型和采样模型两种四维模型,更有效的去除隐藏在四邻域像素之间的相关性以及行、列、时间维度上的相关性。经过四维建模后,改进本文前一章的量化和扫描算法,用于四维矢量矩阵视频压缩编解码器,最后进行熵编码进一步消除统计冗余。最后的实验结果表明本文提出的四维模型能更有效的去除像素之间的冗余,同时采样模型的主观质量要好于堆积模型。采样模型细节处的数字边缘块效应要明显少于堆积模型,这也证明了采样模型在去除邻域像素的相关性同时保证图像清晰度方面有更大的优势。本文提出了三维矢量矩阵DCT整数变换编解码器和四维矢量矩阵DCT整数变换编解码器模型,扩展了多维矢量矩阵理论在彩色视频压缩编码领域的应用。
【Abstract】 Nowadays multimedia applications are deeply embedded into all aspects of our lives with the development of computer and network technology. It has become an inevitable trend to provide users with multimedia services in the computer technology field. Compared with other traditional data applications, the most prominent feature of multimedia applications is the integration of audio, video and other types of media streams. These digital video signals generate extremely high data rates which can not be transmitted without first being compressed. Although currently there is rapid development in computer hardware, the hardware technique still can’t meet consumer’s requirements in video information. On the other hand, even if there are enough storage space and fast transmission speed, it is a waste of bandwidth without compression. The critical challenge for data compression is to reduce the bit rate without affecting picture quality. Therefore, video compression technology has been a permanent research focus in digital video technology.Color video consists of consecutive color images. There are correlations between the pixels of all the images, such as strong correlations in temporal domain in addition to the correlations in spatial domain and the correlations between the color components. All the correlations indicate that there are plenty of redundancies in video sequence. Video compression must be carried out using mathematical method. Multi-dimensional vector matrix theory can effectively decrease the redundancies between inter, intra frames and color spaces of video sequences, which is modeled in an entirely way like magic cube. Multi-dimensional transform is carried out in order to achieve good compression results with good image quality.The research group leading by professor Hexin Chen have been studying multidimensional transforming model in recent years. They proposed multidimensional theory and enriched the theory in image and video coding. Great success is achieved in image and video coding, such as 3D-DCT and 3D-WDCT. However, there are many different multiplication rules defined in the former multidimensional theory and it is not suitable to carry out in application, our group recently proposed a new theory—multi-dimensional vector matrix theory, which enriched the theory on multi-dimensional matrix and resolved the problem of multiplying matrices of different orders. Based on the theory, deep researches are carried out in video coding.Supported by the National Nature Science Foundation Project of China ’Research on multi-dimensional vector matrix orthogonal transformation codec in color video’and National Nature Science Foundation International cooperation Project of China’Ubiquitous Computing based on Synchronous Coding for Mixing Video/Audio’, this thesis analyses the application requirement and key problems existing in image and video coding, discusses multidimensional signal processing technology. This thesis carries out the research on multidimensional vector matrix DCT integer transform, quantization and scanning method, multidimensional model in order to make foundation on effective codec.Discrete cosine transform (DCT) is widely used in the field of image and video coding. The transformation is significant in all sub-optimal performance which can remove the image elements’correlation in the transform domain in highly efficient way, so it makes the foundation in compression. However, floating DCT will produce accumulating errors because the floating operation’s precision of computer is finite. Especially mismatch will happen in the decoder. So the integer to integer transform reduces the accumulating errors and improves the encoding efficiency. Because integer multiplication can be replaced by addition and shift operation, it can improve efficiency and reduce operation complexity. This thesis extends the theory of multi-dimensional vector orthogonal transformation matrix and presents a new 4-D Order-4 DCT integer transform operator based on the theory of multi-dimensional vector matrix discrete cosine transform (MD-VMDCT). Meanwhile, the orthogonality of the operator and energy concentration are verified in the paper. Also the comparison between the integer and the float 4D-VMDCT is carried out. At last the video sequence is compressed by using our approach. The experimental results show that the algorithm is correct and effective. It is better than H.264/AVC under the same conditions and it is slightly lower in performance compared with floating 4D-VMDCT.This thesis proposed 4-D Order-4 DCT integer transform operator and got effective performance in video coding. However, the 4D-VMICT coefficients have been encoded using vector quantization which heavily depends on the code book. It is not widely used in video coding and not compatible with international standards. So this thesis proposes a technique for generating the quantization cube and an improved zigzag scanning method suitable to MD-VMICT codec after studying the statistical properties of the DC and AC coefficients. An exponential function is used to quantization and it is verified to 2-exponential function in order to easily carrying out by shift operator. After determining the proper parameter by experiments, the proposed quantization and scan order are tested on various standard test video sequences. The experiments show the wide adaptability. Also the comparisons are carried out with the literature and MPEG-4, whose experiment results show superiority than the literature and MPEG-4. The comparisons between MD-VMICT and H.264/AVC show potential advantages at low bit rate with high activities sequences.The theory of multi-dimensional vector matrix requires dividing the video data, so how to express the multi-dimensional vector matrix efficiently and find the correlation among the various components are the basis for the follow-orthogonal integer transform. We have modeled the three dimension vector matrix in the former study by use of the row, the column and the time components. We have made MD vector matrix orthogonal transformation and obtain good compression. Based on 3D model, this thesis goes on studying 4D model method, This thesis proposes 4D modeling technique in details for 4D vector matrix DCT integer transform (4D-VMICT) codec. The physical sense of 4D model is described with illustrations. Two models which are cumulate model and sampling model are present which can eliminate temporal redundancy, spatial redundancy, and statistical redundancy among the pixels in video sequence by use of the super concentration of the 4D-VMICT. Then an improved quantization and zigzag scanning method suitable to 4D-VMICT codec are proposed based on the properties of codec. After determining the proper parameter by experiments, the proposed method is tested on various standard test video sequences. The experiments show the efficiency and the wide adaptability. The comparisons which are carried out with the literature and MPEG-4 show the method’s superiority. Also the comparisons between 4D-VMICT and H.264/AVC show potential advantages at low bit rate with high activities sequences. At last, different properties about two models are summarized in the experiment results. The results show that sampling model can eliminate redundancy among the pixels more effective than cumulate model. Meanwhile, sampling model has better object result than cumulate model. The thesis proposes 3D-VMICT codec and 4D-VMICT codec, which expands the applications of multi-dimensional vector matrix theory in color image and video compression field.
【Key words】 Multi-dimensional vector matrix; DCT; integer transform; quantization; scanning method; 4D vector model;