

【作者】 陈江

【导师】 杨鉴;

【作者基本信息】 云南大学 , 模式识别与智能系统, 2010, 硕士

【摘要】 非母语口音、少数民族语口音是汉语普通话连续语音识别应用中必须面对的问题。论文以纳西语口音为实例,研究如何利用民族语口音的发音变异规律,在低成本和易于扩展的前提下,实现由标准普通话识别器到民族语口音普通话识别器的变换。论文的主要工作包括:(1)基于HTK平台,用863标准普通话语音数据库训练了一个标准普通话语音识别器,以作为基线系统。(2)采用MLLR和MAP方法,实现了对民族语口音语音数据的声学模型自适应。(3)用经过声学模型自适应的语音识别器对民族语口音语音数据进行语音识别,根据识别结果计算声母、韵母和音节的混淆矩阵。(4)研究民族语口音普通话的声母、韵母和音节的变异规律,采用专家知识指导下的数据驱动方法,设计出了一种新的多发音词典生成策略,以实现用某种口音(或某说话人)的音节混淆矩阵自动构建该种口音(或该说话人)的多发音词典。(5)在有语言模型和无语言模型的条件下,用实验验证了说话人相关、口音相关发音词典的有效性。实验结果表明,在有语言模型、不考虑声调的前提下,基线系统识别纳西语口音的最好识别率为:50.26%,引入MLLR+MAP声学模型自适应后识别率提高为:80.56%。在声学模型自适应的基础上,分别引入说话人相关、口音相关发音词典,则最好识别率可分别到达:85.15%、82.59%。

【Abstract】 This dissertation primarily concentrates on Chinese speech recognition for nonnative speaker which is almost unavoidable for LVCSR (Large Vocabulary Continuous Speech Recognition). Taking the Putonghua spoken by the speakers whose native language is Naxi as the target languages, we attempt to establish accent-specific speech recognizers from an available standard Putonghua speech recognizer, based on the Initial-Final structure of the Chinese language, in combination with the variation regularity of pronunciation in this minorities’accent.The contributions of this dissertation are as follows:(1) Baseline hidden Markov models (base-line system) were trained by using the project 863 standard Mandarin corpus based on HTK platform.(2) Aimed at Yunnan minority Naxi speech, nonnative mandarin speech recognition is discussed applying general speaker adaptation MLLR and MAP.(3) Firstly, the nonnative speech data from Naxi area in Yunnan was transcribed with the baseline HMMs after adaptation. In addition, the transcribed result was forced aligning with the reference transcription through dynamic programming (DP). Finally, calculate the confusion matrix of base syllables, initials and finals.(4) Study the initials, finals and syllables variation regularity of linguistic minorities accented Putonghua using data-driven method in combination with expert knowledge; a novel strategy of building multi-pronunciation lexicon which can be easily extended to the other accents was proposed to automatically construct the multi-pronunciation lexicon of the given accent(speaker) based on its syllables confusion matrix.(5) Verify the effectiveness of speaker-dependent and accend-dependent pronunciation dictionaries.Experimental results show: the use of baseline, after using the language model, the highest correct rates of base syllable was 50.26%. Using MLLR+MAP, the base syllable correct rates of raised to 80.56%. After acoustic model adaptation, using of the speaker-dependent and accend-dependent pronunciation dictionary, we reached better recognition rates: 85.15%, 82.59%.

  • 【网络出版投稿人】 云南大学
  • 【网络出版年期】2011年 05期

