节点文献

语音识别中的后处理技术研究

Post-Processing Technique for Speech Recognition

【作者】 吴斌

【导师】 郭军;

【作者基本信息】 北京邮电大学 , 信号与信息处理, 2008, 博士

【摘要】 普通话大词汇量连续语音识别的研究已经进行了十多年,虽已取得了显著进展,但距离广泛应用还有相当的距离。语音识别后处理是将前处理所得到的音节流转换为汉字流的过程。研究发现,语音识别系统的后处理对提高系统性能具有十分重要的意义。人类听觉实验表明,人只能听清连续语音流中70%的音节,剩余的30%是靠上下文知识来猜测理解的。因此,语音识别后处理受到了广泛的关注,得到了越来越深入的研究。本文主要对普通话大词汇量连续语音识别后处理中的语言模型自适应、解码策略、错误处理等问题进行了研究,主要工作与创新包括以下几个方面:1.汉语混淆网络算法首先研究了最小贝叶斯风险解码准则以及基于最小贝叶斯解码准则进行最小字错误率解码的若干方法,例如:基于N-best lists的方法、基于word lattice的方法等。在此基础上,考虑到汉语语言的特点,提出一种构造汉语词混淆网络的算法,对于汉语词格(wordlattice)中的长弧,在强制对齐时根据其发音特点快速有效地加入null弧。实验表明改进的构造汉语词混淆网络进行解码的方法与MAP(Maximum a posterior)解码、先前的各种错误率最小化算法相比,有效地降低了普通话大词汇量连续语音识别词错误率。汉语中一个词一般由1—4个汉字组成,由不同数目汉字组成的词的发音时间长短差别比较大,造成构造的汉语词混淆网络中包含了大量的null弧。本文提出一种构造汉字混淆网络来获取具有最小字错误率的识别结果假设的方法,这种算法显著地减少了构造的汉字混淆网络中的null弧的数目。实验结果表明这种构造汉字混淆网络进行解码的方法有效降低了识别结果的字错误率。2.解码结果的错误检测与纠正研究在普通话大词汇量连续语音识别中,识别结果出错的现象和原因非常复杂。本文首先分析了一些常见的普通话大词汇量连续语音识别结果中的错误及其出现的原因。在此基础上,采用基于转换的学习方法从混淆网络中学习纠错规则,实验表明应用这些纠错规则能够有效降低识别结果的词错误率。考虑到汉语语言的复杂性以及用于错误纠正规则学习的训练语料集有限,不能覆盖所有的错误现象,本文使用统计的方法进行错误的检测与纠正。具体地,本文提出一种基于支撑向量机SVM(SupportVector Machines)进行错误检测与纠正的框架,首先使用SVM对识别结果假设字串中的每个字进行分类,判断其正确性;接下来对于分类为错误的字基于汉语字混淆网络构造候选字序列,对候选字序列重新打分,选择最高得分的字串作为错误纠正的结果。实验结果表明这种方法能够有效地检测出识别结果中的错误并进行纠正,降低了字错误率。3.语音识别中的区分性语言模型研究语言模型自适应是根据不断变化的应用环境,调整语言模型中各种现象出现的概率,以适应不同应用环境的特征。本文将Boosting、Perceptron以及最小化样本风险三种算法用于训练语音识别系统中的N-Gram语言模型,使其对特定领域具有自适应能力。实验结果表明使用这三种算法训练的N-Gram语言模型降低了特定领域的语音识别结果的词错误率。其中Perceptron算法训练的N-Gram语言模型的领域自适应能力最好。所以本文在通用领域的语音识别中,将输入的语音与识别输出的汉语词混淆网络作为训练样本,使用Perceptron算法训练区分性语言模型,并用这种语言模型对汉语词混淆网络重新打分。实验结果表明这种方法有效地降低了识别结果的词错误率。

【Abstract】 Mandarin Large vocabulary Chinese continuous speech recognition has been researched for more than ten years. Although there are some achievements in continuous speech recognition research, the distance from widespread application is still very long. The post-processing of speech recognition is a processing which converts Pinyin to Chinese characters. The research shows that the post-processing of speech recognition has very important significance to improve the system performance. The experiment on hearing indicates that human can only hear 70% syllables in continuous speech and understand the remaining 30% using context knowledge. Therefore, the post-processing technique of speech recognition is paid a great attention and conducted in-depth studies.In this thesis, we will make a deep research on the post-processing technique of Mandarin large vocabulary continuous speech recognition, including language model adaptation, decoding strategy and error handling. The main contributions and innovations are described in details as follows:1. Chinese confusion network algorithmsAt the beginning, we study the minimum bayes risk decoding rule and some minimum word error rate decoding methods. According to the characteristic of Chinese linguistics, we proposed an improved algorithm of constructing Chinese word confusion network. The improved algorithm fleetly adds null arc in confusion set when the long arc with a Chinese character string is forcibly aligned in the process of constructing a confusion network. This improved algorithm were evaluated on 2005 HTRDP (863) Evaluation task, where improved word accuracy performance was observed.In general, a Chinese word consists of 1-4 single Chinese characters, so its pronunciation time length change quite large. So we proposed an novel Chinese character confusion network algorithm for the purpose of decreasing the number of null arc. Experimental result proves that this algorithm cut the character error rate of recognition results effectively.2. Research on detection and correction of decoding resultsOn the basis on analysis of decoding errors and reason, we proposed a method that we use transformation-based learning for learning error correction rules from Chinese word confusion network. Experimental result shows significant improvements over recognition results.Considering the complexity of Chinese and the limited corpus for learning error correction rules, we use statistical methods to detect and correct decoding errors. In details, we use SVM to classify the decoding results, detect the errors; then we use Chinese character confusion network to correct errors. Experimental result shows that this method can effectively detect and correct decoding errors, and reduce the character error rate.3. Study on discriminative language model of speech recognition Firstly, we study three discriminative methods of language modeladaptation, including the boosting algorithm, the perceptron algorithm and the minimum sample risk algorithm, and present comparative experimental results on the performance of using different approaches to train discriminative language model on the task of speech recognition. Then we use the perceptron algorithm with best discriminative performance to train discriminative language model for general domain Mandarin large vocabulary continuous speech recognition, and rescore the Chinese word confusion network. Experimental result shows that this method can effectively reduce the word error rate.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络