节点文献

基于统计学习的人脸图像合成方法研究

Research on Statistics-Based Realistic Face Image Synthesis

【作者】 杜杨洲

【导师】 林学訚;

【作者基本信息】 清华大学 , 计算机科学与技术, 2004, 博士

【摘要】 人脸图像合成是新一代人机交互中的重要技术,也是当前活跃的研究方向,在计算机图形学和计算机视觉界都得到广泛的关注。其潜在的应用领域包括窄带视频传输、计算机辅助教学、游戏制作、虚拟现实等等。传统方法采用精细的网格模型或曲面模型建立人头三维结构,采用生理模型或参数模型进行人脸动画。这类方法在三维数据获取设备的易用性、模型表示的准确性、算法复杂度及鲁棒性等方面还存在不少问题和矛盾。近年发展起来的基于样本方法可以不做三维重构而直接利用样本图像合成人脸。这类方法不需三维建模,避免了建模误差,其合成结果具有与样本图像相似的真实感。本文正是着眼于基于图像样本的人脸合成方法,探讨在完全不提取三维信息的情况下,如何利用图像样本集和统计学习理论,合成大角度的人脸姿态变化,模拟不同的人脸光照效果,以及为任意人生成真实感表情图像。本文在充分回顾已有方法的基础上,在人脸合成问题的不同方面提出了若干创新性想法,并给出让人振奋的实验结果。本文的工作显示了基于统计学习方法进行人脸合成的特点和优势,为人脸建模和动画技术的发展开拓了新思路。本文的主要研究内容和创新点包括以下方面。第一,提出了一种基于因素分解模型的多姿态人脸图像合成方法。这里将“人的身份”和“头部姿态”看作影响人脸图像的两种变化因素,用多姿态人脸图像数据库做为训练集学习这两种因素的交互作用。当给出一张测试人脸图像时,利用因素分解模型的“转移”功能,就可以生成测试人脸在训练集已有姿态下的图像和训练集人脸在测试人脸姿态下的图像。这里还应用核函数方法将线性因素分解模型扩展到非线性情况,有效地解决了原模型在“转移”应用中的局部极小问题。对于训练集外的任意人脸,经过“光照校正”的预处理和“图像变形”的后处理,就能合成出这张人脸在不同姿态下的图像。这里提出的方法可以用于多姿态人脸数据库的构建、多姿态人脸的识别和验证等等。第二,提出了“形状纹理关联映射”的思想,用于解决人脸动态特征的真实感表现问题。关联映射的基本思想是说在图像中人脸特征的形状和纹理有一定的关联关系,如果找到了从形状到纹理的这种关系并以映射的数学形式进行<WP=5>表达,就可以根据人脸的形状变化自动地生成真实自然的动态纹理。这里以脸部变化最复杂的部分——嘴部图像——的合成为例,给出了关联映射的具体实现方法。利用这个映射可以使用很少的几个形状参数重构出整个嘴部图像的变化。另外,在一段人脸表情视频上实现的关联映射表明,仅由人脸特征点位置偏移就能相当成功地恢复出脸部表情的变化细节。这里提出的技术可以集成到会说话的人头系统中生成真实感嘴部动画,也可以应用到基于模型的视频编码系统中来进一步节省传输带宽。第三,提出了一种情感参数控制的真实感表情图像生成方法。借助于一个人脸表情图像数据库,可以训练出从情感状态到表情图像的映射,这里称作“情感函数”。情感函数能够以任意人的中性脸为输入,根据情感参数的控制,为这个人生成相应的情感图像。情感函数实际上描述了脸部表情随内心情感的变化方式,因此可以用不同的数据集训练不同类型的情感函数,以表现出不同风格的情感。在建立人脸表情图像的参数化统计模型时,为使合成算法适用于任意人,模型训练采用了相对量而不是绝对量。以每个人的中性脸为参考点,将表情脸与中性脸做减法得到形状相对量,做除法得到纹理相对量。这样能够以独立于特定人的方式提取出表情变化量。从实验结果看,这里提出的方法仅用一张中性脸图像,就可以为任意人合成情感状态可控的真实感表情图像。

【Abstract】 Realistic face image synthesis appears as an important technique in the field of human computer interaction. It is also an active research topic both in computer vision and computer graphics community. The potential application of this technique includes low bit-rate video transmission, computer aided instruction, game design, virtual reality and so on.Traditional approaches use wire-frame model or surface model to build 3D head structure, and use physiological model or parametric model to make facial animation. However, there are many problems to be solved with it on the acquisition of 3D data, the accuracy of model representation, the complexity and the robustness of algorithms. Most recently, the example-based approaches are also actively adopted in face synthesis. This new strategy, utilizing example images directly, without any 3D reconstruction, can often achieve more realistic effects than traditional methods.This thesis focuses on example-based face synthesis, discusses how to utilize training examples and statistic methods to synthesize face image in a wide range of view, under different lighting condition, and with various emotional expressions. We summarize the relevant literatures first, and then propose our novel ideas and show the exciting results by interesting experiments. Our work demonstrates the advantage of statistical learning theory in solving problem of face image synthesis. It also points out a new direction of development in the field of face modeling and animation.The main contributions of this work are listed as follows.First, a multi-view face synthesis method based on factorization model is proposed. Here “human identity” and “head pose” are regarded as two influence factors, and their interaction is trained with a face database. With the special ability provided by factorization model, a test face can be translated into old views contained in training faces, and training faces can also be translated into new view of the test face. The original bilinear factorization model is also extended to nonlinear case so <WP=7>that global optimum solution can be found in solving “translation” task. Thus, with a pre-processing and a post-processing procedure, an arbitrary new face is able to be translated into other views. The proposed method can be applied to areas such as multi-view face database building and face recognition across a wide range of view.Second, a method of dynamic facial texture generation based on shape appearance dependence mapping is proposed. It has been proved that there is a high correlation between shape and texture of facial features. Based on this observation, the dynamic facial texture can be generated according to shape variation and this strategy is called dependence mapping. We implement the dependence mapping on mouth region and show that a realistic mouth animation can be generated according to several shape parameters. We also test performance of the mapping by using a video clip of facial expression. The experiment shows that the expressive details are successfully recovered from the movement of facial feature points. The proposed technique can be integrated to a talking head system to generate realistic animation, or applied to a model-based coding system to produce more efficient bit-rates.Third, a simple methodology for mimicking realistic face by manipulating emotional status is proposed. A mapping from emotional status to facial expression is trained with a face database. The mapping is called “emotional function” which can be used to generate expressive face for new person by utilizing just his/her neutral image. In fact the emotional function describes the way of expression variation according to inner emotion. Because of this, the function can be trained with different training set to reflect different affective style. While building the statistical model for face image, the model is trained in relative way rather than in original way. This training strategy makes expressive details extracted in person independent manner. As the experi

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2005年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络