节点文献

MPEG-4兼容的人脸语音动画系统及其在网络通信中的应用

【作者】 吕江波

【导师】 虞露;

【作者基本信息】 浙江大学 , 通信与信息系统, 2003, 硕士

【摘要】 MPEG-4是一个基于对象的多媒体压缩标准,允许将场景中的音频、视频对象(自然的或合成的)独立编码。MPEG-4中定义了“人脸对象”这样一个特殊的视频对象,通过脸部定义参数(FDP)和脸部动画参数(FAP)可以定制人脸模型,并使之产生动画效果。MPEG-4能够将人脸动画和多媒体通信集成在一起,并且可以在低带宽的网络上控制虚拟人脸。 TTS(Text to Speech,文本语音合成)作为MPEG-4中引入的一种有吸引力的合成语音编码技术,它与人脸动画的结合将具有广泛的应用前景。同时,MPEG-4为TTS合成器定义了一个应用程序接口,利用这个接口,TTS合成器可以为人脸模型提供音素和相关的时间标记信息,而音素可以转换成相应的口型,这将使得人脸动画和合成语音能够很好的结合在一起。 本文是基于我们实验室已有的研究工作,在仔细考察了人脸动画的研究现状之后,确定了“MPEG-4兼容的人脸语音动画系统及其在网络通信中的应用”作为自己的研究方向。在MPEG-4标准的范畴下把人脸动画与TTS合成语音集成在一起,不仅是项崭新的研究工作,而且它将在虚拟主持人、窄带的网络通信等中有着很好的应用。因此在研究的基础上,本人还开发出了“Grimace VTTS”和“Grimace Chat”这两个有应用潜力的原型系统。 本文将围绕上述研究方向详细的开展如下几个方面的讨论: 1、标准层面,对MPEG-4标准及其定义的“人脸对象”进行介绍和理解; 2、技术要素层面,对实现真实感图像绘制的OpenGL技术、以及采用到的Microsoft Speech SDK 5.0中的TTS引擎进行研究和实践; 3、系统架构层面,对本人提出的人脸语音动画系统(Grimace VTTS)的框架结构、以及适用于窄带网络下的可视通信系统(Grimace Chat)的框架结构进行介绍和分析; 4、具体算法层面,其中包含脸部肌肉的运动效果模拟方法、真人照片纹理贴图的优化算法、建立发音口型库和脸部表情库的方法、过渡帧的插值算法、运动混合与协同发音的算法、表情与语音动画叠加的方法、以及系统中实现动画口型与合成发音同步的方法等; 5、系统实现及应用层面,将详细介绍原型系统—“Grimace VTTS”和“Grimace Chat”的开发技术、系统功能、使用方法和应用场景; 6、系统性能评价层面,将介绍人脸动画系统的主观评价结果,并首次对系统开展客观性能的评测,其中包括动画绘制帧率、函数的运行性能分析等; 7、系统运行要求和工作展望层面,将介绍当前原型系统运行时对软、硬件平台的要求,同时对Grimace系统的发展做出展望,并将提出参考性建议。

【Abstract】 MPEG-4 is an object-based multimedia compression standard, which allows the encoding of different audio visual objects (natural or synthetic) in the scene independently. Face object, is a special visual object defined in MPEG-4. Facial definition parameter (FDP) and facial animation parameter (FAP) are the sets of parameters to calibrate and animate the face object. MPEG-4 enables integration of face animation with multimedia communications and allows the face animation over low bit rate communication channels.TTS (Text to Speech) is one of the promising synthetic audio tools provided by MPEG-4, and its integration with facial animation will definitely lead to lots of applications. MPEG-4 defines an application program interface for TTS synthesizer. Using this interface, the synthesizer can be used to provide phonemes and related timing information to the face model. The phonemes are converted into corresponding mouth shapes enabling simple talking head applications.Taking into account of previous effort of our lab, I have made a survey of current research status about facial animation, and then I choose A MPEG-4 compatible facial animation system with TTS support and its application in network communication as my research direction. Integration of facial animation with synthetic speech will not only be a new field for our research work, but also it will serve an important role in such applications as virtual newscaster and virtual communication over low bandwidth. So I have also developed two promising prototype systems, which are called "Grimace VTTS" and "Grimace Chat" correspondingly.This paper will focus on the following aspects:1. Standard, an overview of MPEG-4 standard and basic technology about facial object of MPEG-4 are presented.2. Technology support, OpenGL and TTS engine of Microsoft Speech SDK 5.0 are introduced in detail, and some practice and examples will also be discussed.3. Framework of Grimace system, the framework of Grimace VTTS (prototype aiming at virtual newscaster) and the framework of Grimace Chat (prototype aiming at virtual communication) are proposed and each module is described.4. Algorithms in Grimace system, many specific algorithms adopted and optimized for Grimace system are presented and discussed.5. Implementations and applications, the tools used in developing Grimace system are introduced, and functions and using methods of Grimace system are described in detail.6. Evaluation of Grimace system, both subjective evaluation and objective evaluation of Grimace system are presented.7. Platform requirements and future work, run-time platform requirements of Grimace system are introduced, then future directions and my suggestion of this prototype system are presented.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2003年 02期
  • 【分类号】TP391.41
  • 【被引频次】1
  • 【下载频次】203
节点文献中: 

本文链接的文献网络图示:

本文的引文网络