节点文献

基于H.264压缩域的视频对象分割

Video Object Segmentation in the H.264 Compressed Domain

【作者】 陆宇

【导师】 张兆杨;

【作者基本信息】 上海大学 , 通信与信息系统, 2009, 博士

【摘要】 MPEG和H.264是当今主流的视频编码标准。MPEG-4标准首先提出基于内容的视频编码,有关视频对象分割的研究大多集中在像素域,而基于压缩域的视频对象分割直至近年来才开始引起关注。鉴于实际应用中大多数视频序列已经压缩为某种格式,直接在压缩域内进行视频对象分割,可免除对压缩视频进行完全解码;而且在压缩域内需要处理的数据量也比像素域少很多,因此计算量大大减少,存储数据的空间也大大减少;此外通过从压缩视频熵解码提取出的运动信息可直接用作分割所需的运动特征和纹理特征。因此,从压缩域分割视频对象具有快速的特点,可解决传统的像素域分割难于满足实时分割的要求,更适合于有实时性要求的应用场合。研究基于压缩域的视频对象分割具有较强的理论上和应用上的意义。H.264是最新的视频编码标准,相比于MPEG编码效率提高了一倍,目前越来越多的应用都在转向采用H.264来取代MPEG。本文基于从H.264压缩域提取出来的运动信息(包括宏块编码模式和运动场),进行视频对象分割。另一方面应指出的是,由于从H.264压缩域提取的运动信息不能完全反映对象的真实运动,所以基于H.264压缩视频的运动分割存在很大困难,体现在:(1)H.264压缩域能用于对象分割的信息比较有限,仅可使用的是宏块编码模式和运动场这两个运动信息,仅靠这些信息提取视频对象比较困难;(2)H.264压缩域的运动信息不够准确,宏块编码模式不能完全反映视频帧背景,运动场不代表对象的真实运动,这些问题使得分割质量难以提高;(3)因为要在压缩域内同时保证分割速度和分割质量,现有的许多像素域的运动分割方法难以应用。本文通过对上述问题的研究,解决了其中的一些技术难题,包括运动信息的致密化和准确化,利用宏块编码模式辅助检测背景,利用运动场的时空相关性提高分割的效率和质量,结合区域分割和区域分类的方法提高分割的有效性,基于高效的运动估计方法检测运动背景以此简化能量函数的构造。本文的主要工作和创新点包括:(1)对于原始运动场过于稀疏和噪声大的问题,提出一种有效的运动场预处理技术。先采用空时滤波方法去除属于噪声的伪运动矢量,然后采用后向估计,前向投影的运动场累积方法获得稠密的运动场。(2)仅利用运动场进行分割难以提高对象分割的效率,提出一种以宏块编码模式辅助运动分割的方法。利用H.264的宏块编码模式所反映的视频帧背景信息进行运动估计,从而缩小了运动估计的范围,提高了对象分割速度。并基于得到的背景信息,采用χ2假设检验的方法提取视频对象。(3)为更准确的区分对象区域与背景区域,提出一种结合区域生长和区域分类的对象分割方法。以幅度、散度、旋度三个运动特征来描述运动场,然后基于这些运动特征,采用改进的统计区域生长方法将运动场分割为不同的区域,接着采用四阶矩的分类方法进行区域分类,最后采用投影滤波的方法细化分割结果。该方法的特点在于通过先进行区域生长,后进行区域分类的方式,有效地将对象区域与背景区域分开,从而提高了分割的准确性。(4)针对基于马尔可夫随机场(MRF)的方法难于提取具有背景运动的H.264视频对象的问题,提出一种基于Graph Cuts的视频对象分割方法。首先进行背景检测,采用跨度阈值的方法检测静止背景,结合运动估计的方法检测运动背景,利用检测到的背景信息简化了Graph能量函数的构造,从而可提高Graph Cuts优化算法的运算效率。本方法不仅适用于静止背景的视频对象分割,也适用于运动背景的视频对象分割,且分割质量优于同类方法。

【Abstract】 MPEG and H.264 are the leading video coding standard these days. The MPEG standard first proposes the video coding based on the content. The video object segmentation in the compressed domain attracts attention in recent years while most work is focused on the pixel domain before. Since most videos are converted into the compressed format, the video object can be extracted in the compressed domain where the video need not be decompressed completely. At the same time, the computation load decrease and the necessary data storage reduce greatly because the processed data are much less in the pixel domain. In addition, the motion information extracted from the compressed video with entropy decoding are used as the motion characteristic and texture feature. Therefore, the fast object segmentation in the compressed domain can meet with the real time requirements which the traditional pixel domain methods fail to satisfy. The research on video object segmentation in the compressed domain has the academic meaning and applied meaning.H.264 is the latest video coding standard and its coding efficiency is twice as the MPEG standard. More and more applications turn to the H.264 in place of MPEG. The video object segmentation in this dissertation uses the motion information including the macro-block coding mode and motion field extracted from the H.264 compressed domain. On the other side, some difficulties exist in the object segmentation in the H.264 compressed domain because the extracted motion information can not reflect the true motion. These difficulties are: (1) The required segmentation information in the H.264 compressed domain is limited, the segmentation task in this dissertation is rather hard only based on the macro-block coding mode and motion filed. (2) The motion information in the H.264 compressed domain is not accurate, the macro-block coding mode cannot reflect the video background entirely and the motion field cannot represent the true motion. These problems make it hard to improve the segmentation quality. (3) Many current motion segmentation methods in the pixel domain are not easy to fulfill the requirements of both segmentation speed and segmentation quality in the compressed domain.Based on the research on the above problems, some trouble issues including motion information densification and correction, the background detection via macro-block coding mode, the improvement for efficiency and quality of segmentation via the spatio-temporal correlation, the improvement for segmentation effectiveness by combining region segmentation and region classification as well as the background detection by fast motion estimation as to simplify the energy function formulation. The major work and innovations in this dissertation are listed as following:(1) The effective motion preprocessing is proposed to resolve the initial sparse and noisy motion field. The spatio-temporal filtering is exploited to remove the noisy motion vectors. Then the backward estimation and forward projection are used to obtain the dense motion field.(2) The macro-block coding mode is used in the object segmentation to improve the processing speed because it is hard to increase the segmentation efficiency based on motion field only. The background of video frames indicated by H.264 macro-block coding mode can narrow the range of motion estimation and improve the processing speed. At last, theχ2 testing is used to extract the video object based on the estimated background.(3) In order to classify object region and background region more accurately, the method combining region growing and region classification is proposed in the dissertation. At first, the magnitude, the divergence and the curl are used to characterize the motion field, the modified statistical region growing is applied to divide the motion field into different regions. Then the 4-order moment is used for region classification and the final result is refined by the projection filtering. The way combining region growing and region classification can classify the object region and background region as well as improve the segmentation results.(4) To the problem that Markov random filed (MRF) is hard to extract object from the H.264 video with moving background. The segmentation method using Graph Cuts is proposed in the dissertation. At beginning, the static background is detected by the span thresholding while the moving background is detected by the motion estimation. The obtained information about background simplifies the energy function of Graph as well as improves the computation efficiency of Graph Cuts method. The proposed methods can be used to extract object from the video with static background or moving background. Moreover, the segmentation quality is better than the similar method.

  • 【网络出版投稿人】 上海大学
  • 【网络出版年期】2010年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络