节点文献

视觉语音合成技术在英语发音辅导中的应用探究

Visual Speech Synthesis Technology and Its Application Studies in English Pronunciation Tutoring

【作者】 许芹

【导师】 张际平;

【作者基本信息】 华东师范大学 , 通信与信息系统, 2007, 硕士

【摘要】 随着全球一体化进程的迅速推进;我国与世界各地之间的交流日益频繁,英语作为国际通用的工作语言越来越受到人们的重视。但是,由于多年只重视书面教学,和缺乏良好的口语学习环境,致使我国当前的英语口语教学收效甚微。虽然计算机技术在我国发展的如火如茶,但是我国的计算机辅助语言学习(CALL,Computer-Assisted Language Learning)却仍然停留在起步阶段。针对这一现状,笔者将视觉语音(Visual Speech)技术应用于英语初学者的语音教学。本文参考美国大力推广的Phonics教学法,开发了一个唇形—语音同步的英语发音辅导系统,希望从以下两方面帮助英语初学者学习语音:一是根据语音的双模态特性,视觉语音可以帮助用户更好的观察、模仿脸部发音动作,有助于用户理解、记忆语音;二是借助于视觉语音技术呈现的用户界面更加友好,人机交互更加和谐、自然,这样对于缓解英语初学者的压力,提高学习者的学习积极性有很大帮助。本文所做的工作主要有以下几点:(?)基础标准层面,对MPEG-4定义的“人脸对象”进行介绍并以该定义中人脸动画的参数(FAP)为基础开展后面的工作;(?)技术要素层面,对本文采用的Microsoft Speech SDK 5.1中的TTS引擎进行研究和实践;(?)系统架构层面,对本文提出的视觉语音合成系统(TTVS)的框架结构、进行介绍和分析;(?)具体算法层面,详细介绍实现视觉语音动画合成系统的步骤和算法等;(?)系统应用层面,将详细介绍“EP Tutor”系统的知识结构、各模块功能及其应用场景;(?)工作展望层面,将对EP Tutor系统进一步的发展做出展望。

【Abstract】 With the rapid globalization process, China’s foreign exchanges have become increasingly frequent. As a common working language, English has been paid increasing people’s attention on. However, our only concern about written English and the lack of English speaking environment led to little achievement in Spoken English teaching in China recently. Although the development of computer technology in China is in full swing, the computer assistance pronuciation learning is still lingering in the initial stage.In view of this situation, we applied the visual speech technology to tutor English pronunciation for the beginners. Referring to the Phonics pedagogy popularized in US, we developed an English Pronunciation tutoring system based on Lips-simultaneous voice. It will help beginners learn English pronunciation at the following two aspects:First, according to the dual-modal characteristics of voice, visual speech will be a great benefit to users not only in observation and imitation simulation, but also in comprehension and recollection.Second, the user interface using Visual Speech technology will be more friendly, and the human-computer interaction will be more harmonious. This natural system will be of great help to ease the pressure and to fire up the enthusiasm of the beginners in their pronunciation learning.This paper will focus on the following aspects:A. Standard, the basic technology about facial object of MPEG-4 is presented.B. Technology support, TTS engine of Microsoft Speech SDK 5.1 is introduced in detail, and some practice and examples will also be discussed.C. Framework of system, the framework of TTVS(Text-To-Visual Speech) system is proposed and described.D. Algorithms, many specific algorithms adopted and optimized for TTVS system are presented and discussed.E. Applications, the knowledges, functions and using scenes of EP Tutor system are described.F. Future work, EP Tutor system is introduced, and future directions of this system are presented.

  • 【分类号】TN919.8;TN912.33
  • 【下载频次】240
节点文献中: