节点文献

连续语音关键词识别系统中自适应技术的研究

Research on Adaption Technique in Continuous Speech Keyword Spotting System

【作者】 朱莉

【导师】 赵铁军;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2006, 硕士

【摘要】 自动语音识别技术在当代人们的生活中有了越来越广泛的应用。目前自动语音识别又大致分为连续语音识别和关键词识别。相对于连续语音识别,关键词识别在提高系统对话自然度方面更有优势,因为它的特点是通过捕捉用户说话中包含重要信息的关键词而不是必须完全正确地识别出一句话中的每个词来理解其意。这对于在自然对话情景下口语的不规范、不连贯等问题也是一种很好的解决方案。在自动语音识别中,当训练语音和识别语音有较大差别时,将导致系统的识别率急剧下降。自适应技术就是利用少量的被测试人的语音调整系统参数,来缩小系统模型与被测试人之间的差距,提高识别率。本文主要目的是对说话人自适应技术和说话人归一化技术在关键词识别系统中的应用进行研究和探讨。研究的主要内容包括:1.基于连续隐马尔可夫模型(CHMM)框架的非特定人关键词识别基线系统的构建。探讨了构建此系统所涉及到语音预处理、特征参数提取、声学层模型的建立与训练、关键词检出、关键词确认等内容。并对基线系统进行了评价,提出了在基线系统中加入自适应模块的必要性。2.研究了说话人自适应技术和说话人归一化技术,并提出了将两种技术相结合的思想。实验表明在训练时加入说话人归一化技术,可以使训练得到的模型更具有说话人无关性,在此基础上进行自适应时能达到更高的识别率。在实验中对几种说话人归一化方法与自适应方法相结合的情况进行了比较和验证,并选择了说话人归一化方法中的说话人自适应训练方法(SAT)与受约束的最大似然线性回归(CMLLR)相结合的方案。3.结合构建的关键词基线系统,实现了一个面向股票信息查询的交互式语音查询系统,在系统中加入了说话人自适应模块,实现了两种自适应方案。最后对系统进行了评价,验证了本文探讨的自适应技术和说话人归一化技术的有效性。

【Abstract】 Automatic speech recognition is used more and more widely in people’s life, which is categorized into continuous speech recognition and keyword spotting. Compared with continuous speech recognition, keyword spotting has advantage in increasing the naturalness of the dialogue. It is due to the user’s meaning is understood by catching the keywords with important information of his utterance, while there is no need to recognize every word accurately. Keyword spotting is also a good solution for problems of tongue, such as non-standard, incoherence, etc.When there are many differences between the speeches for training and the speeches for testing, the performance of the system is greatly degraded. Adaption technique can reduce the gap between system model and speakers by adjusting the parameters of the system using a few speeches from the speakers, which increases the recognition rates.In this thesis, we focus on the application of speaker adaption technique and speaker normalization technique in keyword spotting system for the following aspects:1. A baseline system of keyword spotting based on Continue Hidden Markov Model (CHMM) is constructed. We discuss the design of baseline system in detail, which includes speech pretreatment, feature extract, acoustic models establishing and training, keyword detection, and keyword verification, etc. Also we evaluate the baseline system and bring forward the necessity of adding adaption module in baseline system.2. Both the speaker adaption technique and speaker normalization technique are investigated, and then the idea of combining the two techniques is brought forward. Experimental results indicate that the trained model is more independent after adding speaker normalization technique in the training, and the adaption based on this model could achieve higher recognition rates. Comparation and validation of the combination between several speaker normalization methods and speaker adaption methods are done. We select the scheme of combining SAT and CMLLR.

  • 【分类号】TP391.42
  • 【被引频次】3
  • 【下载频次】308
节点文献中: