节点文献

基于ASR的儿童语言教育系统的研究与实现

Reserch and Implementation of Chrildren Speech-Triaining System Based on ASR

【作者】 许开维

【导师】 沈军;

【作者基本信息】 东南大学 , 计算机应用技术, 2006, 硕士

【摘要】 随着现代计算机技术的普及和发展,计算机的使用越来越深入到人们的日常生活中。人类与计算机进行交流时,最直接和方便的方式就是语言交流,所以语音识别和语音合成技术已成了现代科技发展的一个标志,语音识别和语音合成也因此成为现代计算机技术研究和发展的重要领域之一。语音识别技术与多种学科的研究领域都有联系,这些领域的科研成果也成为推动语音识别技术发展的重要因素。语音识别技术已经取得了一些成就。但是,大多数语音识别系统仍局限于实验室中试用,远远没有达到实用化的要求。本文研究了语音自适应技术中两种常用的说话人自适应方法:最大后验概率(MAP)方法和最大似然线性回归(MLLR)方法。在此基础上,本文提出一种适合于语音识别的复合渐进自适应方法。这种新方法成功地结合了MAP和MLLR两种方法的优点。新方法使用了一个全局转移矩阵来简化MLLR模块,用来解决环境和说话人生理引起的差异,提供了更加精确的MAP模块初始模型。另外,渐进的MAP模块用来精细的刻画基于音素层次的差异,同时也确保了整个方法的渐进性。本文应用复合渐进方法对微软语音识别引擎进行了改进,在随后的验证性实验中,这种复合方法取得了较好的效果。实验证明,这种新方法能够有效地克服说话人差异和环境差异对识别系统的影响,能较好地适合语音识别系统的要求。在上述理论研究成果的基础上,本文结合了现代教育技术的成果与儿童语言教育的需求,成功地应用改进后的微软语音识别引擎开发了儿童语言教育软件,实现了中文语音识别、VC++、Flash和微软语音识别引擎之间的通讯、中文/拼音/英文语音识别、发音正误判断动画、TTS等功能。该软件形象直观,具有较强的实用性,是一种较为成功的儿童语言教育工具。本文通过对语音识别自适应技术中的方法研究,将其成果应用到了儿童语言教育实践中去,取得了较为良好的效果,具有较为理想的研究和应用价值。

【Abstract】 With the progress of modern computer technology, more and more computer application is involved in everyday life. When people use computer, speech exchange with computer may be the most direct and convenient way. Therefore, Speech recognition and synthesis has become a significant mark of science and technology development, which becomes one of the important fields in computer research and development. The technology of speech recognition relates to multi-science. The achievement in these fields has contributed to the development of speech recognition. So far, most speech recognition system is still in its infancy and some problems will arise if migrated from lab, which is much far from practicality.This paper discussed various algorithms of adaptive techniques, especially focused on two classical methods:MAP (Maximum a Posteriors) and MLLR (Maximum Likelihood Linear Regression). Then, a new approach is presented in this paper, integrating MAP and MLLR for incremental adaptation. In the new approach, the simplified MLLR module uses a single globe regression class to minimize the mismatches caused by the environment and speaker anatomical differences, and provides a more accurate initial model to the MAP processing. The incremental MAP module is used for a further subtle removal of phoneme-level variations, and to ensure the asymptotic properties of the whole approach. We use the new approach to improve the Microsoft SDK, which is highly effective in our experiments. The results demonstrate that the new approach can effectively deal with both the speaker and environment variations, and is well suited for the speech recognition.Based on the above theoretical research, this paper combined the modern educational technology and the demand of children’s linguistic education; it successfully developed the software of children speech education by applying the improved Microsoft voice identification engine. It has fulfilled the functions of Chinese voice identification, the communication between VC++, Flash and the voice identification engine, the voice identification of Chinese and English, the correct and error cartoons of pronunciation, TTS, etc. This software is a successful tool of children’s linguistic education by its intuitive images and practical features.The paper applied the research in the automatic voice identification technology to the children’s linguistic education and gained satisfactory result, which has significance on two levels: the theoretical and the practical.

【关键词】 ASRSpeech APICOM儿童语言教育自适应技术
【Key words】 ASRSpeech APICOMChrildren speech-trainingadaptive
  • 【网络出版投稿人】 东南大学
  • 【网络出版年期】2007年 04期
  • 【分类号】TP311.52
  • 【下载频次】179
节点文献中: