节点文献

基于人脸特征定位和建模理论的视频编码关键技术研究

Research on Key Knowledge of Video Coding Based on Facial Feature Localization and Face Modeling Theory

【作者】 范小九

【导师】 彭强;

【作者基本信息】 西南交通大学 , 计算机应用技术, 2012, 博士

【摘要】 人脸作为人类区别于其他生物的关键特征之一,在人际交往及社会活动中扮演着主要信息载体的角色,因而对其进行全面而深入的研究具有十分重要的理论和现实意义。随着实时多媒体服务的兴起,视频会议、可视电话、新闻播报等应用都与人脸有着直接或间接的联系。伴随这些应用的广泛推广,人脸研究的重要性更是与日俱增。在视频编码及通信界,通常会用“会话视频序列”来对上述应用加以概括。本文即以会话视频序列为研究主体,结合人脸检测、特征定位及模型构建理论开展相应的视频压缩方法和技术路线研究。在经典的视频压缩理论中,所有的帧图像及编码单元都基于同等重要性而被顺序编码。随着研究的深入,人们逐渐意识到视频编码算法的评价指标除了压缩率和峰值信噪比(Peak Signal to Noise Ratio, PSNR)之外,还应考虑“感兴趣区域(Region of Interest,ROI)”的编码质量。事实上,使用者往往以对ROI压缩效果的主观感受的好坏来直接评价视频编码结果的可接受程度。因此,如何保证或提高会话视频序列中人脸ROI的编解码质量是当前会话视频编码领域中亟待研究的前沿课题。本质上,网络带宽、计算能力等编码资源的限制和有效信息在传输过程中的丢失是制约视频编码图像质量的主要因素,其在低带宽、高误码率应用的实时会话视频编码中的影响尤为突出。因此,本论文探讨了两种对人脸ROI予以侧重的编码策略和一种解码端的差错掩盖方法,以实现在给定的信道条件下达到最佳的人脸ROI的主客观视频质量。首先,论文提出了一种用于人脸区域及其特征保持的比特分配及资源优化方案。方案考虑了三个方面的预处理工作。第一,为实现人脸ROI的快速提取,利用人脸区域在会话视频序列中丰富的运动特征,精简了传统Adaboost人脸检测算法中庞大的金字塔式候选图像子集。第二,为保证所提取人脸ROI的准确性,结合肤色特征完成了人脸ROI的辅助确认。第三,为获取人脸轮廓及其他面部特征的宏块(Macro Block,MB)位置,对Snake算法和主动轮廓模型(Active Shape Model, ASM)的搜索范围、收敛方向及能量平衡态判决条件等算法参数的选择方法进行了优化。在参考人脸结构特性为各编码MB赋予特定比特分配优先级的基础之上,方案设计了相对精确的MB级绝对差均值(Mean Absolute Difference, MAD)自适应预测模型和量化参数(Quantization Parameter, QP)更新算法,从而完成了有侧重的比特分配。方案还根据对MB编码模式和其他编码条件的深入分析完成了进一步的资源优化。模拟实验表明,本方案实现了人脸ROI的快速提取及相关特征的较准确检测,优化了编码比特和其他资源的分配方式,较好的保证了人脸ROI及其特征位置的编码质量。与JM9.8中传统比特分配算法及相关参考文献中比特分配算法的实验结果对比显示,在相同编码比特率情况下,本方案人脸ROI的PSNR获得了提高。同时,比特分配与编码资源的优化配置相结合缩小了本方案编码器的帧级目标比特与实际比特的误匹配差距及总体编码耗时。另外,主观测试也进一步验证了本方案能提供视觉效果更好的视频重建质量。其次,论文介绍了视频编码中的全局率失真优化(Rate Distortion Optimization,RDO)思想及其传统解决方法,讨论了编码过程中考虑编码依赖关系的重要性。在将会话视频序列编码依赖性简化为人脸ROI时域依赖性的基础上,提出了一种由人脸ROI的综合优化和非人脸ROI的独立优化相结合的全局RDO框架。该框架能较好适用于常规One-pass编码结构,其中独立优化部分仍遵循传统的RDO优化规则,而综合优化部分则需考虑人脸ROI失真度对未来帧的时域扩散影响,且两部分通过新的拉格朗日系数相关联。为了统计综合优化中人脸ROI所造成的总失真度,本框架提出了一种基于前向运动搜索的人脸ROI时域扩散替代链的构造方法。结合人脸ROI时域扩散链,给出了一种人脸ROI的失真度时域扩散统计模型,其中基于变换残差的拉普拉斯分布特性构造的特征函数通过从运动补偿预测失真估计量化失真,实现了计算复杂度的降低。模拟实验表明,人脸ROI时域扩散替代链构造方法快速、合理,人脸ROI失真度时域扩散统计模型能够较好的估计失真度扩散情况,该框架为会话视频序列人脸ROI的全局RDO提供了一种有效的实施办法。与JM15.1中基于独立假设的RDO方法及相关参考文献中另一种考虑编码依赖性的RDO-Q方法的实验结果对比显示,本框架实现了视频序列整体及人脸ROI在PSNR差值(Bjontegaard1Delta PSNR, BDPSNR)上的同步提高或编码比特率差值(Bjontegaard Delta Bit Rate, BDBR)的下降。最后,论文研究了会话视频序列的差错掩盖方法,提出了一种基于人脸真实感模型辅助的空域差错掩盖策略。该策略主要包含三个方面的内容。第一,基于主动外观模型(Active Appearance Model, AAM)定位算法效率的高低与AAM初始拟合位置(初始中心、放置方位)和拟合实例(形状实例、表观实例)关系的密切性,设计出人脸关键特征粗定位方法以计算平面偏转角及侧深度偏转角,进而得出AAM模型的初始中心、放置方位和形状实例,同时结合纹理的相似特性确定AAM模型的表观实例,最终给出了一种基于AAM人脸关键特征点提取算法的改进策略。第二,利用得到的AAM人脸关键特征点和Candide-3人脸通用线框模型设计相应的人脸模型姿态调整、形状匹配及纹理映射算法,实现了一种快速的人脸真实感建模方法。第三,根据受损帧预掩盖结果和可供利用的人脸真实感模型,确定各受损MB所属类型划分,从而自适应调用各种空域掩盖算法。特别的,对于人脸ROI纹理块,本策略提出了一种从人脸模型平面映射图中搜索最佳替代块的掩盖思想。模拟实验表明,本策略中AAM改进算法的准确性高于原AAM,且人脸模型构建方法方便快捷,真实感强,为从单张二维图像恢复人脸深度信息的病态问题提供了较合理的解决方案。与基于JM17.0的空域双线插值算法和自适应方向插值算法的实验结果对比显示,基于人脸模型辅助的空域掩盖方法无论在交织打包和棋盘打包情况下,均可实现对受损块的较满意掩盖,提高了人脸ROI的主客观质量,一定程度上解决了人脸ROI丢失尤其足部分特征丢失时的恢复问题。

【Abstract】 As an essential feature to distinguish human from other animals, human face plays the role of main information carrier in interpersonal communication and social activities. For this reason, studies on human face are of great theoretical and practical significance. Particularly, the importance of human face research is sharply growing with the development of real-time multimedia services, such as video conferencing, picture phone, news broadcasting system, etc., which are all directly or indierectly related to human face. Normally, aforementioned applications are generalized as "conversational video sequence" in video coding and communication area. In this paper, the video compression methodology and technology of conversational video sequence will be researched integrating with face detection, facial feature extraction, face modeling and so on.In classic video coding theory, every part of the pictures is sequentially compressed with equal importance. Originally, compression ratio and the peak signal to noise ratio (PSNR) are taken as two basic evaluation indexes to measure a video coding algorithm. As research progressed, more and more people realized the special meaning of the region of interest (ROI). In fact, users always tend to assess the acceptability of a video coding output by observing the quality of ROIs subjectively.Thus, how to guarantee the quality of human face ROI is a frontier subject in conversational video coding.The resource limitations of internet bandwidth and computational power as well as the information loss in transmission are three chief factors that restrict video quality in receiving end, especially in conversational video coding with low-bandwidth and high bit-error-rates. In this thesis, two error-resilient strategies and one error concealment approach were investigated in order to achieve best coding quality in human face ROI under a bit-rate constrained channel.Firstly, the thesis proposed a bit allocation and resource optimization scheme to protect human face ROI and its features. The scheme consists of three pretreatment. To efficiently extract human face ROI, we considered a motion-based sub-image rejection for pyramid searching structure in Adaboost face detection method. To guarantee the accuracy of the extracted human face ROI, verification was made with the aid of facial color statistics. To refine the actual human face contour and other facial feature locations, we optimized the parameter selection of search range and convergence direction as well as the energe equilibrium condition in Snake algorithm and Active Shape Model (ASM). On the basis of assigning priority for each macro block (MB) after considering facial geometry, the scheme designed a relatively precise mean absolute difference (MAD) adaptive prediction model and QP updating rule to achieve the final bit allocation strategy. Besides, the scheme made the coding resource optimization better through thorough analysis of MB mode and other coding options. Simulation results demonstrated that, the scheme can give reasonable human face ROI and facial feature locations for each frame as well as optimal bit-rates and resources for each MB. Hence, the coding quality of human face ROI and its features were well kept. Comparison with the basic bit allocation algorithm in JM9.8and other bit allocation methods showed that the PSNR of human face ROI in our scheme was improved significantly. Meanwhile, the gap between target bit and actual bit of each frame as well as total coding time were reduced in view of the optimization on coding resource. In addition, the subjective assessment further confirmed that our proposed scheme can provide much better video reconstruction quality.Secondly, the thesis introduced the global rate distortion optimization (RDO) problem with its traditional solution and discussed the importance of coding dependencies in encoding process. By simply taking the temporal dependency as the only coding dependency in the conversational video coding, we proposed a novel global RDO framework, which is made up by comprehensive optimization of human face ROI and individual optimization of non-face ROI. Thisframework workswell in common one-pass structure, when the part of comprehensive optimization takes the influence of temporal error propagation of human face ROI into account while the individual optimization still follows the tradition rule of RDO but shares the conjunct Lagrange multiplier with the former. To obtain the total distortion of a certain human face ROI in comprehensive optimization, we constructed a human face ROI temporal propagation alternative chain based on forward motion search. With the ROI temporal propagation chain, a source distortion temporal propagation model for human face ROI was subsequently developed, in which the characteristic function based on the Laplace distribution of transformed residuals using motion compensation errors to estimate quantization errors efficiently eliminated the computational complexity. Simulation results demonstrate that, the constructed human face ROI temporal propagation chain is efficient and reasonable, the proposed source distortion temporal propagation model for human face ROI has a good performance in estimating the propagation of error, and the framework provides an effective way for RDO of human face ROI in the conversational video coding. Comparing with the independent RDO method in JM15.1and another dependent RDO (RDO-Q) method, the proposed framework can achieve obvious BDPSNR (Bjontegaard Delta PSNR) gain and BDBR (Bjontegaard Delta Bit Rate) saving for human face ROI and the entire sequence simutaniously.Thirdly, the thesis studied the error concealment method for conversation video coding and proposed a human face realistic model aided spatial error concealment strategy, which iscomprised of three basic parts. First, according to the fact that the efficiency of active appearance model (AAM) is closely associated with initial fitting position (fitting center, fitting orientation) and fitting instance (shape instance, appearance instance), we developed a coarse-grained face feature point localization method to calculate the plane deflection angle and profile deflection angle, then the fitting centre, fitting orientation and shape instance are determined. After that, the appearance instance was selected by ultilizing texture similarity. Based on tuning the initial fitting parameters, the AAM was improved and the final facial feature points were ensured to be more precise.Second, we designed a pose adjustment, shape match and texture mapping method for constructing realistic human face model by combining the obtained AAM facial feature points and Candide-3generic wire-frame face model. At last, the category of the damaged MB was determined in terms of pre-concealment result and available realistic face model, then kinds of spatial error concealment methods could be adaptively selected. Particularly, we provided a solution to search the optimal replacement of damaged MB from plane mapping result of face model for face ROI texture MB. Simulation results demonstrated that, the improved AAM algorithm is superior to the original AAM on facial feature point extraction and the reconstructed human face is more realistic. The human face model constructing method provides a reasonable solution for the recovery of depth information from single2D image. Comparing with two spatial error concealment methods implemented on JM17.1, i.e., bilinear interpolation method and adaptive directional interpolation method, the proposed method can provide excellent error conealment results to damaged frames especially for destroyed ROI areas, whether in interleave packing style or in dispersed packing style. To some extent, we can say that the proposed spatial error concealment method solves the face and facial feature recovery problem in conversational video coding.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络