节点文献

视频运动对象分割及码率分配与控制技术研究

Motion Video Object Segmentation and Data Rate Distribution and Control Technology

【作者】 陈坚

【导师】 李在铭;

【作者基本信息】 电子科技大学 , 通信与信息系统, 2003, 博士

【摘要】 当今社会人们对信息的需求成为信息技术发展的主要动力,作为最重要的信息形式—视频信息及其处理技术取得了长足的进步。视频信息数据量巨大,给存储和实时传输带来极大的困难,已成为妨碍数字视频技术应用的主要瓶颈,因此需要研究视频数据高效表征及其码率控制技术。对数字视频高效表征,人们进行了大量研究,先后推出了两代编码技术。以MPEG-1、MPEG-2等为代表的第一代视频编码技术,考虑去除帧内以及帧间冗余,采用块的方式进行编码。其最大缺点是没有考虑视频场景的内容构成。多媒体通信与网络综合服务的应用中,需要对信息内容进行操作和交互式控制,因此,人们提出了第二代压缩编码技术,MPEG-4是其中的代表。它先将视频场景分割成若干区域,每一个区域对应着一个语义上有意义的视频对象,然后根据各个视频对象的特征对不同的视频对象采用不同的编码方法。这种基于对象的视频编码技术不仅能大大提高编码效率,而且支持用户对视频数据按内容操作。第二代编码技术需要将视频图像分割为视频对象。这就要求研究视频图像中各种视频对象的运动、纹理、形状以及信息量等特征。按内容对视频场景进行描述和码率控制是基于对象编码和交互式操作的关键和基础,具有重要的理论意义和应用价值;而在现有标准中又没有关于视频对象自动生成和码率控制的具体规定,所以这方面是前沿研究的热点课题。视频图像的帧间运动是全局运动、局部运动或它们共同构成,其中前景目标在全局运动估计中被称为外点。若将外点处的局部运动矢量参与全局运动矢量估算,将影响全局运动估计的复杂度和准确度,外点区域在视频场景中所占区域较大的时候,容易发生这种情况;因此,外点的消除对于准确的全局运动估计非常重要。现有的外点消除通常用统计方法实现,也有基于光流方程的时/空域梯度比来去除外点的方法,但误差很大,效果不好。本文根据视频图像中外点有聚集成块的属性,采用亚采样、边缘特征图像块匹配的预分析方法来去除外点。该方法能去除较大面积外点区域,并可以根据预分析的结果针对不同的图像使用不同的全局运动模型,从而提高全局运动矢量估计的准确度。估算全局运动变化参数时,人们通常采用的方法可以划分为基于空域像素点<WP=8>灰度的方法、基于空域视频特征的方法和基于变换域的方法三大类。在这些方法中,基于空域视频特征的方法,有更好的普适性、抗噪声能力、运动估计精度和特征描述简便性等优点。本文中提出使用多个直线段空域特征进行全局运动估计的方法。在去除视频图像序列中外点区域的基础上,通过提取和比较参考图像与当前图像中的多个直线段视频特征来估计出全局运动矢量参数。该方法能够估计出全局运动的平移、旋转参数,同时算法复杂度较低和估计精度较高。当前一般采用邻帧差分法或光流场法进行运动检测,前者的主要缺点在于不易准确确定运动目标轮廓;后者运算复杂,极易受噪声干扰影响。上述方法在复杂背景或多运动目标的场景下,检测效果都不好。为此,本文提出一种改进的三帧双差分算法,该方法利用多个差分图像来区分不同帧中的运动目标信息,并根据差分图像灰度统计特性自适应地选择二值化门限,从而检测出运动变化区域。本文的方法有较强的自适应性、通用性和抗噪声干扰能力,能够有效地检测和分割出运动目标区域。全局运动补偿后的差分图像由残留噪声区域和运动变化区域组成;运动变化区域的检测,就是划分运动变化区域和残留噪声区域。从数字图像的数据比特结构出发,将图像划分为多个比特层,各比特层包含的视觉信息和噪声是各不相同的。据此,本文提出了一种各比特层预分类,然后进行与合并的技术,能明显地滤除噪声、纹理等干扰,检测出运动图像变化区域。基于比特层分类的技术还可以用于视频图像数据压缩、加密等。由于第二代视频编码压缩技术提出了视频对象的概念,引出了同时对多个视频对象进行编码的码率控制问题。本文在研究传统码率控制方法的基础上,根据率-失真理论,建立了视频对象间码率分配原则,并提出相应的码率控制算法,从而实现了保证信源QoS(率-失真)下,有限带宽(总码率)按视频对象的高效分配。上述各个研究点都进行了相应的PC仿真,并获得了好的结果,本论文所研究的理论和技术对于视频图像序列中目标检测、识别与分割技术,对于视频图像序列基于内容的数据压缩与编码码率控制有有重要的理论和实用参考价值。

【Abstract】 In modern society the requirement on information is becoming the main factor to promote the development of information technology. On the video information and its processing technology have been made much progress. Because of the enormous data, it is quite difficult to be saved and lively transported. And also be hindered the application of digital video information. So it is urgently required to take a research on the effective representation of video data and its encoding rate control. As the first generation of video encoding, MPEG-1 and MPEG-2 are both based on the blocks in frame and prediction between frames. Although they are greatly reduced the related redundancy, but have not use the content segmentation of VO(video object). With the development of the interactive multi-media applications in the multi-media communication and integrated network service, the second generation of video encoding comes into being with its representative of MPEG-4. The video scene is divided into a lot of regions with each region corresponding to a meaningful video object (VO) on syntax; and then the different encoding techniques can be adopted to different video objects according to their features. These encoding methods can greatly improve efficiency and the user can operate the video data according to the content. To the second generation of encoding it is necessary to make more analysis on the motion, texture, shape and information quantity of different video objects in images. As we know, the automatic creating and rate control of VO is the key point of encoding based on object and interactive operation while there are no concrete specifications on them in existing standards. Thus, the research on how to create VO and the rate control of multi-VO has become a pop subject. The motion object region is called as out-point in the global motion estimation. Because the local motion vector of out-point is participated in forming the global motion vector, the accuracy and complexity of global motion estimation will be influenced especially when the out-point region is a big part in images. So the elimination of out-point becomes significant for accurate global motion estimation. Usually out-point is eliminated by statistical method. The pre-analysis of video image based on the ratio of temporal gradients to spatial gradients is used in some papers to eliminate out-points, but its effect is not good. In our thesis, the pre-analysis based on block match of edge characteristic image is adopted according to its characteristic that out-points tend to gather into block in image. Through this way, the fairly big region of out-points can be eliminated and different models of global motion are used for the different images. Thus the accuracy of the <WP=10>estimation of global motion vector has been improved greatly. There are three kinds of techniques to estimate the global motion: techniques based on pixel level, visual features in spatial domain and visual features in transformation domain. In the view of the ability of anti-noise, adaptability and the accuracy of estimation, the techniques based on spatial visual features are the best in the three. In this thesis, the technique of global motion estimation with multi straight-line features is discussed. In this way, it is easy to estimate the parameters of global displacement and rotation with good accuracy and relatively simple algorithm. To extract moving object in the video sequences, the adjacent frame difference and optical flow methods are adopted extensively. Its main drawback is that the outline of the motion objects is hardly to be detected precisely. In this thesis, an improved algorithm of double differences in the adjacent triple frames has been raised. The region of motion can be extracted effectively with better adaptability and strong ability of anti-noise.After compensation of global motion, the difference image is composed of remainder noise region and motion changing region. Based on the date structure of image, the image has been divided into bit planes. The vi

节点文献中: 

本文链接的文献网络图示:

本文的引文网络