节点文献

能量受限条件下的手语视频编码方法研究

The Research on Sign Language Video Encoding under Energy Constraints

【作者】 陈晓雷

【导师】 张爱华;

【作者基本信息】 兰州理工大学 , 控制理论与控制工程, 2014, 博士

【摘要】 手语是由手形、手臂运动并辅之以表情、唇动以及其他体势表达思想的视觉语言,是聋哑人进行交流的最自然方式。与头肩视频不同,手语视频由于增加了手形、手臂运动,并且存在手脸遮挡现象,所以更为复杂,对其进行研究难度更大。和手语视频识别与合成研究相比,目前针对手语视频的编码研究还较少,且大多数都是基于率失真(Rate-Distortion, R-D)理论,以给定编码码率为约束,研究编码码率和失真之间的关系,使重建手语视频的失真最小。但是,随着无线网络带宽的快速增加和新一代视频编码标准H.264的广泛应用,编码码率的约束性已经越来越弱,而无线视频终端在功耗上所受的制约却越来越强。因此,如何在无线视频终端能量有限的约束条件下,使手语视频经编码后的失真最小,减小能耗、延长电池的更新周期已成为一个迫切需要解决的问题。本论文对能量受限条件下的手语视频编码进行了深入的研究,目的是利用聋哑人视觉选择注意机制、功率率失真理论和感兴趣区能量分配视频编码方法实现手语视频编码功耗、编码码率和编码失真之间的动态平衡优化,在确保手语视频主客观编码质量的同时,尽可能降低无线视频终端总体功耗,延长电池更新周期,为解决能量受限条件下聋哑人手语视频编码的最优化参数配置和资源分配提供新理论和新方法。本论文的研究工作主要包括:(1)理论分析和实验统计了影响H.264手语视频编码复杂度的因素,将H.264手语视频编码器参数按照复杂度分为四种不同的级别,每种级别具有不同的编码复杂度和编码质量,然后依据无线视频终端电池能量和视频运动复杂性自适应地选择编码级别。实验结果表明该方法在保证手语视频编码质量基本不变的同时,能够减少编码器计算复杂度,节省无线视频终端系统的计算资源。(2)综合考虑无线视频终端电池能量的时变性和聋哑人视觉注意机制的不平衡性,建立了感兴趣区能量感知手语视频编码方法,该方法在帧层依据无线视频终端当前可使用电池能量和视频帧复杂度确定参考帧数和搜素范围,在宏块层依据手语视频不同宏块区域的视觉重要性确定宏块预测模式和量化系数,最后根据帧层和宏块层共同确定的参数进行编码。实验结果表明该方法在保证手语视频感兴趣区编码质量的同时,能够进一步减少编码器计算复杂度,节省无线视频终端系统的计算资源。(3)详细分析了H.264帧内、帧间和跳帧三种编码模式的功率率失真(Power-Rate-Distortion,P-R-D)特性,在此基础上,分别建立了编码一帧手语视频的能耗模型和P-R-D模型,并提出了优化一帧视频中采用帧内、帧间和跳帧编码模式宏块个数的算法,实验表明所提出的P-R-D模型和实测P-R-D性能相吻合。(4)针对手脸遮挡条件下的手语视频手势检测问题,提出一种基于力场(Force Field)转换的手势检测方法。该方法首先分别计算手脸遮挡帧和纯脸部帧的力场图像,然后将力场图像分块并统计各分块直方图特征,再将相同空间位置的分块直方图对应相减,得到各分块直方图灰度分量差,最后将各分块直方图灰度分量差与灰度阈值进行比较获得手部位置。实验证明该方法能够实时进行手脸遮挡条件下的手势检测。

【Abstract】 Sign language is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Compared with head and shoulder video, sign language video is more complex and the reaseach about it is challenging.Currently, the reaseaches about sign language video encoding are limited and mostly based on Rate-Distortion theory to achieve the minimum distortion of decoded sign languge video. However, the R-D theory mainly research on the relationship between Rate and Distortion under the rate constraint. With the rapid development of wireless communication, the enhancement of the wireless channel bandwidth, and the popularity of Advanced Video Coding standard H.264, the constraints on the rate become weaker and weaker. At the meantime, the processing capabilities insufficiency of mobile devices and the microprocessor’s power-constraint problem caused by battery power become the major restriction to the development of mobile sign language communication.This dissertation conducts in-depth research on sign language video encoding. The work aims to achieve the optimal balance among encoding power, encoding rate and encoding distortion by utilizing the visual selection attention mechanism of deaf community, Power-Rate-Distortion theory and regions of interest power allocation method.In general, the research of this dissertation can be summarized as follows:(1) The factors which will affect the complexity of sign language video encoder are analyzed at first. Based on the analysis results, a novel computation resource allocation algorithm is proposed. The algorithm can allocate the computation resource of the encoder adaptive to available battery power and video contents. Experimental results show the proposed algorithm can highly reduce the computation resource while maintaining video coding quality.(2) A scheme which allocates the computational resource of the sign language video encoder adaptive to available battery power and deaf people’s visual system is proposed. In the scheme, encoding levels which determine number of reference frames and search range are adaptively selected according to the battery power and frame complexity at frame level. Then possible partition mode and quantization parameter are adaptively adjusted at the macro block (MB) level according to the relative priority of each MB. Experimental results show that the proposed algorithm obtains better peak-signal-noise-rate of face and hands that improves the intelligibility of sign language video, the computation complexity of encoder is reduced further.(3)An analytic P-R-D model to obtain optimized tradeoffs among power consumption, bit rate, and distortion for sign language video encoding is proposed. In particular, numbers of different macroblock (MB) coding modes are intelligently controlled through an optimization process according to their distinct P-R-D performance. Both the analytic and simulation results have shown the applicability of our scheme for mobile sign language video encoding.(4) A novel algorithm to track the hand during hand over face occlusion in sign language video is proposed. The algorithm is based on image force field transformation. First, the frames with a hand occluding the face and those with only a face are transformed to force field images. Then the force field images are partitioned into sub-images and the histograms of each sub-image are calculated. For each sub-image, the histogram of frame with only a face is subtracted from the frame with a hand occluding the face to get the difference histogram. Finally, for each sub-image the difference histogram is compared to threshold to get the position of the hand. Experimental results show that the proposed algorithm is capable of real-time tracking of hand.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络