节点文献

低比特率真实感人脸视频编码研究

Research on Realistic Face Video Coding at Low Bit Rate

【作者】 於俊

【导师】 汪增福;

【作者基本信息】 中国科学技术大学 , 模式识别与智能系统, 2010, 博士

【摘要】 人机情感接口(人脸表情运动参数的跟踪和提取、表情识别、参数传输以及高真实感语音同步人脸动画的合成)是当今计算机视觉和图形学领域的一个研究热点,它在人机交互、视频编码、娱乐和虚拟现实等方面有着非常多的应用。在过去的三十年中,虽然相关领域取得了长足的发展与进步,但仍存在许多亟待解决的问题。其中,如何在发送端根据人脸视频快速获取准确的人脸运动和表情参数,并根据这些人脸运动和表情参数,在接收端合成高真实感的语音同步人脸动画是一个富于挑战性的研究课题。本课题涉及运动分析、人脸表情识别、信源和信道编码、人脸运动学和动力学建模及其表示、协同发音机制建模以及文本驱动人脸动画等诸问题。本文以极低比特率下模型基人脸视频编、解码为研究对象,对相关的人机情感接口问题进行深入研究,重点探讨人脸表情运动参数的跟踪和提取、参数化视频编码以及高真实感语音同步人脸动画合成等问题。本文的创新点和主要工作如下:(1)提出了一种基于单幅帧图像的人脸自动适配算法。首先,从输入视频中检测出首帧包含目标人脸的图像,然后以该图像为处理对象,利用改进的支持向量机算法(SVM)实现对其中的人脸的定位,利用Adaboost+Camshift+AAM (Active appearance model)算法实现对人脸特征点的定位:接着,利用上述人脸及其特征点的特定信息,在编码端对一个简洁人脸通用三维模型进行特定化处理以得到待处理人脸的构造参数(FDP:Facial definition parameter);在此基础上,构建在解码端使用的特定化精细人脸三维模型。(2)提出了一种基于在线模型匹配与更新的人脸三维表情运动跟踪算法。具体言之,利用自适应的统计观测模型来建立在线外观模型,利用自适应的状态转移模型和改进的粒子滤波算法实现对观测场景的确定性和随机化搜索,同时通过融合目标的多种测量信息以减少光照和个体相关性的影响。利用所提出的人脸三维表情运动跟踪算法既可以得到反映目标人脸整体姿态的全局刚体运动参数,又可以得到反映人脸表情变化的局部非刚体运动参数。(3)对人脸表情识别算法进行了深入研究。首先提出了一种静态人脸表情识别算法,该算法在提取人脸表情运动参数后,根据与表情相关的生理学方面的知识完成对表情的分类识别。接着,为了克服静态人脸表情识别算法的不足,提出了一种结合表情静、动态信息的表情识别算法。该算法在多表情马尔可夫链模型和粒子滤波的框架下结合表情的生理模型完成对人脸运动和表情的同步识别。(4)提出了一种面向MPEG-4人脸表情运动参数(FAP:Facial animation parameter)的压缩算法。该算法利用面部运动基函数(FBF)来组合FAP,可以在无编码延迟的情况下,通过帧间和帧内编码来达到降低码率的目的。(5)提出了一种基于MPEG-4的三维人脸表情动画合成算法。该算法采用参数模型和肌肉模型相结合的方式来生成人脸动画,可在FAP流的驱动下生成真实感较强的三维人脸表情动画。此外,还对协同发音机制进行了建模,利用该模型可生成与英语音素对应的人脸视素动作。这样,根据由文本解析得到的音素信息、附加的表情信息和持续时间信息,对视素之间的动画采用非均匀有理B样条函数进行插值可以获得与英语语音同步的表情人脸动画。(6)在前述研究的基础上,在国际上首次设计并实现了一个集人脸表情运动参数跟踪/提取、表情识别、参数传输以及真实感语音同步人脸动画合成等功能的视频编解码演示系统。该演示系统可在解码端根据解码后的参数合成真实感的人脸动画。

【Abstract】 Human machine emotional interface (facial expression parameters tracking and extraction, facial expression recognition, parameters transmission and high realistic synchronized speech facial animation) is a hot topic of research in the field of Computer Vision & Computer Graphics and has a lot of applications in Human-Computer Interfaces, Video Coding, Entertainment, and Virtual Reality, etc. In the past 30 years, great progress and developments have been made in these areas. However, at present, it still has a lot of problems. Therefore, how to obtain correct facial motion and expression parameters quickly from video containing face on transmitter, how to transmit these parameters specially using human facial knowledge, how to obtain synchronized speech driven high realistic facial animation according these parameters on receiver, and how to obtain high rate of expression recognition result are challenges, they concern many problems including the motion analysis in computer vision, facial expression recognition, source and channel coding, the kinematic and dynamic modeling and representation of individualized face, the mechanism of co-articulation and text driven facial animation, etc.Facing to ultra-low bitrate model based facial video coding/decoding area, in this paper, we study human machine emotional interface related problems in some aspects, and pay more attention to the issues of facial expression parameters tracking and extraction, parameterized video coding, and high realistic synchronized speech facial animation specially.The innovation aspects and majoy work in this paper are as follows:(1) A face adaptation algorithm based on single image is proposed. Firstly, the first frame containing face in input video is detected. Based on this frame, improved SVM (Support Vector Machine) is utilized for face detection, Adaboost+Camshift+AAM (Active appearance model) are utilized for feature localization. Then the coder gets FDP (Facial Definition Parameter) through Face adaptation of a simple universal triangular model. Finally the decoder adapts a complex universal triangular model using these FDP.(2) A 3D facial expressional motion tracking algorithm based on online model adaptation and updating is proposed. The algorithm constructs the online model using an adaptive statistic observation model, and statistic search and determinately search are applied to observation scene simultaneously using the combination of adaptive state transition model and improved particle filter. Multi-measurements are infused to decrease lighting influence and person dependence. Then, not only global rigid motion parameters can be obtained, but also local non rigid expressional parameters can be obtained.(3) Based on deeply research on facial expression recognition, an algorithm for static facial expression recognition is proposed firstly, facial expression is recognized after facial actions are retrieved according to facial expression knowledge based on particle filter. Coping with shortage of static facial expression recognition, an algorithm combining static facial expression recognition and dynamic facial expression recognition is proposed, facial actions as well as facial expression are simultaneously retrieved using a stochastic framework based on multi-class expressional Markov chains, particle filter and facial expression knowledge.(4) An algorithm for compressing MPEG-4 facial animation parameters (FAP) is proposed. Facial action basis function (FBF) are used to group FAP, then we can lower bit rate by combing intraframe and interframe coding scheme, and it does not introduce any interframe delay.(5) A 3D facial expression animation algorithm based on MPEG-4 is proposed. This algorithm produces facial animation combing parameterized model and muscle model, and can produce high realistic facial expression animation with FAP flow. Furthermore, this algorithm could produce facial viseme actions considering the co-articulation effect in speech. Then according to phonemes from text analysis, phoneme duration, additional expression information, and interpolation between viseme using NURBS, synchronized speech facial expressional animation are obtained.(6) According to the above researches, internationally for the first time, a facial expression parameters tracking and extraction, facial expression recognition, parameters transmission, high realistic synchronized speech facial animation Demo System is constructed. The system could produce high realistic facial animation from decoded parameters on decoder.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络