节点文献

面向移动视频监控的转码技术研究

Research on Video Transcoding for Mobile Video Surveillance

【作者】 莫林剑

【导师】 陈纯; 卜佳俊;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2008, 博士

【摘要】 视频监控在国民安防建设中发挥着重要作用;移动视频监控的出现将其应用范围进一步扩大,人们可以利用手机、PDA等移动设备随时随地查看监控视频,从而摆脱传统视频监控需要专用设备的束缚。但由于无线网络带宽窄、波动大,以及移动设备计算能力差等限制,移动视频监控相比传统视频监控存在着监控质量差,延时较长等缺陷。移动视频监控所面临的挑战对其核心技术——视频转码提出了更高的要求。针对视频转码的一些关键技术展开研究,以摆脱移动环境限制、提升视频质量,对移动视频监控以及其他移动视频应用而言,都具有重要的理论价值和现实意义。视频转码是移动视频监控系统适应复杂的无线移动环境,为计算、显示能力各异的移动设备提供合适视频流的关键,它根据实时网络状况与用户需求,转换视频流格式。本文的研究将基于压缩效率最高的H.264编码标准,强调实时性,并强调在联合多个转码过程的多目标联合转码下达到性能最优。本文根据移动视频监控的特性及需求,针对视频转码技术的几个关键点展开研究,主要包括语法格式转码、码率转码、分辨率转码、容错信息嵌入转码,以及组合多个转码过程的多目标联合转码技术。主要研究内容概括如下:首先对视频转码相关技术进行概述,简单介绍视频压缩编码的一些基本概念,并回顾了视频编码技术的发展历史和基本框架,重点介绍最新的视频编码标准H.264;随后介绍转码技术,包括其技术目标、框架结构及功能分类;最后简要介绍其他与移动视频监控密切相关的视频技术,并对容易混淆的术语进行约定。为支持各种视频格式之间的快速透明访问,研究以MPEG-4为代表的以往标准到H.264的语法格式转码。分析H.264与MPEG-4编码工具的异同点,并对其中计算复杂度最高的模式决策及运动估计展开研究。提出三种候选编码模式预测与优化技术,以减少甚至直接得到运动补偿模式;还提出一种时空结合的多参考帧运动估计技术,相比只利用时间相关性的算法,有效提高了运动矢量的预测准确性;最后提出一种自适应搜索范围选择算法,进一步提高转码速度。针对移动视频监控系统网络带宽差异大、波动大的特点,通过码率控制实现码率转码,并研究低复杂度码率控制算法。针对视频监控的特点,提出一种根据复杂度和缓存器数据水平分配比特数的帧层码率控制技术,有效保证画面质量的平滑;针对基于率失真模型的传统码率控制方法计算量大的缺陷,提出一种基于查询表的宏块层码率控制策略,在保证视频质量的同时,有效降低计算复杂度。针对移动终端处理、显示能力有限,且具有异构性的特点,研究适用于H.264的任意比率分辨率下采样转码快速算法。分析H.264在运动估计、补偿块尺寸等方面的特性,及其对分辨率转码造成的影响;在此基础上对宏块编码方式确定、运动矢量重建及Inter宏块模式决策等内容展开研究。提出一种适用于H.264的运动矢量重建算法,能较准确地得到任意比率分辨率下采样后,各个新运动补偿块的运动矢量预测值;提出一种自底向上合并的快速模式决策技术,以牺牲少量画面质量为代价获得非常可观的转码速度提升。针对无线网络传输差错率高、带宽窄的特点,研究基于敏感区域(ROI)保护的容错信息嵌入转码。在移动视频监控应用中,转码阶段嵌入容错信息相比传统的编码源嵌入方式,能够更好地把握客户端网络状况,控制容错信息冗余度。提出一种低复杂度的敏感宏块自动识别机制,根据一系列编码参数,预测宏块丢失对视频质量的影响值,从而判别敏感宏块。随后对敏感宏块的保护策略展开分析,并比较Intra宏块刷新及运动矢量保护两种方法的性能。最后研究组合码率、帧率、分辨率、容错信息嵌入及语法格式转换的多目标联合转码技术。在实际的移动视频监控应用中,通常会同时对多项转码目标提出需求;而简单地级联多个转码器并不能得到最佳的性能。分析帧率、分辨率下采样,以及运动矢量重建过程执行顺序对性能的影响;以码率转码为基础,提出三种功能和复杂度不同的联合转码结构,并比较它们的性能;文章最后总结了移动视频监控实际应用中可能遇到的各种问题及相应的转码解决方案。

【Abstract】 Video surveillance plays an important role in the industry. The emergence of mobile video surveillance further expands the scope of its application. People can use mobile phones and PDAs to watch the surveillance scenarios at anytime in anywhere. However, compared to the traditional video surveillance, mobile video surveillance suffers the problems of poor picture quality and long latency, due to the narrow wireless network bandwidth and limited mobile device computational power. As the key technology of mobile video surveillance, video transcoding faces many challenges. There is important theoretical and practical value to research on the key technologies of video transcoding.When the incoming coded video stream with specific format is not fit for the current network or user requirement, video transcoding can transform the coded stream from one format to the required one. In this paper, we make some researches on the key technologies of transcoding, mainly containing syntax transcoding, bit-rate transcoding, spatial resolution transcoding, error resilient transcoding and multi-objective transcoding. The research is based on the latest video coding standard, H.264, in consideration of the narrow bandwidth of wireless network. Moreover, the practical applications usually demand on a number of transcoding objectives, hence we also try to achieve optimal performance in a joint multi-objective environment. To summarize, the research includes the following aspects:Firstly we briefly review some basic concepts and the development history of video compressing, especially the latest video coding standard H.264. Then we introduce video transcoding, including its technical objectives, framework structures and classification. Other closely related technologies are also introduced.Regarding the syntax transcoding, the MPEG-4 to H.264 syntax transcoding is studied. By analyzing the similarities and differences between H.264 and MPEG-4, the research objectives of MPEG-4 to H.264 transcoding are introduced. We mainly focus on mode decision and motion estimation modules, which have the highest computational complexity. Three candidate mode optimization technologies are proposed. They can reduce the number of candidate modes or directly make the mode decision. We also propose a fast multi-reference-frame based motion estimation algorithm, which exploits the temporal correlation as well as the spatial correlation. Compared to the reference methods, which only take the temporal correlation, the proposed algorithm has much lower computational complexity. Moreover, an adaptive search range selection method is also proposed to further improve the transcoding speed.Regarding the bit-rate transcoding, we adopt rate control as its realization method. We firstly analyze the reasons that why bit-rate transcoding and rate control can be integrated. Then we make an effort to research on the low complexity rate control algorithm. Based on the features of video surveillance, a frame level rate control algorithm is proposed. It can provide smoother picture quality. Moreover, without rate-distortion model, a look-up table based macroblock level rate control algorithm is also proposed. The proposed macroblock level algorithm can effectively reduce the computational complexity as well as ensuring the picture quality.Regarding the spatial resolution transcoding, we analyze the characteristics of the H.264, and mainly focus on the coding type decision, motion vector reconstruction and mode decision modules. An arbitrary motion vector reconstruction method for H.264 spatial resolution down-sampling is proposed, with which, the predicted motion vector can be achieved. To further speed up the transcoding, a bottom-up merging mode decision algorithm is also proposed. It utilizes the proposed early-stop and motion vector reconstruction methods, and achieves very substantial increase in transcoding speed at a cost of a little picture quality lost.Regarding the error resilient transcoding, the ROI region protection scheme is studied. A macroblock sensitivity degree model is proposed. With the coding type, motion vector and other information, the model gives the impact value of video quality lost for a macroblock, then sorts the impact values and picks out the ROI macroblocks with low computational complexity. Then two ROI region protection strategies, intra-refresh and motion vector protection methods, are analyzed and their performance is also compared.Finally, this paper studies the multi-objective transcoding, including the composition of bit-rate, spatial resolution, temporal resolution, error resilient and syntax transcoding. Simply cascade multiple transcoders will not reach the optimal performance. We analyze the implementation sequence of the motion vector reconstruction in the joint-transcoder with spatial, temporal resolution and syntax transcoding. Based on bit-rate transcoding, three transcoder structures with different functions and computational complexies are proposed to apply to different practical applications with diversity setting.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2008年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络