节点文献

声学模型区分性训练及其在LVCSR系统的应用

Discriminative Training for Large Vocabulary Continuous Speech Recognition

【作者】 刘聪

【导师】 戴礼荣; 江辉;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2010, 博士

【摘要】 声学模型区分性训练是近年来语音识别领域的研究热点之一,它已经成为当今主流的语音识别系统,尤其是大词汇量连续语音识别LVCSR系统中最重要的模型训练手段之一。本文主要针对声学模型区分性训练及其在LVCSR系统中的应用问题进行较深入的研究和讨论。另外,本文对语音识别系统的另一个重要模块——置信度判决也有所涉猎。首先,本文提出了一种新颖的、称为“受限线性搜索”CLS的优化算法,该算法用于语音识别区分性训练中的CDHMM模型参数更新。CLS方法可以用于区分性训练统一准则框架下各种区分性准则的模型更新,包括MMI、MCE、MWE/MPE等。在该方法中,HMM的区分性训练问题首先被定义为一个受限优化问题,并且直接使用模型间的KLD度量来定量的描述所定义的模型间限制。接着,基于简单的线性搜索思想,我们发现在将该模型限制转化为二次函数形式后,可以很容易获得模型更新参数的闭式解。CLS方法可以用于优化CDHMM模型中的各种参数,包括高斯均值、协方差矩阵、权重等。接着,本文对我们此前提出的称为“信任区域”(Trust Region)的区分性训练模型参数更新方法进行了进一步理论分析和扩展。Trust Region方法通过将MMI区分性训练问题转变为一个优化理论中可参考的标准问题,从而准确高效的求取待优化函数的全局最优点。在引入上述模型间限制的前提下,Trust Region方法可以对区分性训练中的辅助函数进行完美的优化。然而,在区分性训练中对辅助函数的最优化无法保证对原始目标函数的优化。因此我们通过对Trust Region问题的深入理论分析,提出构造一种称为“有界信任区域”(Bounded Trust Region)的新辅助函数。该辅助函数仍然是目标函数的有效估计,更重要的是,在满足模型间限制的前提下,该辅助函数是原始目标函数的下界。这个优良品质可以确保对该辅助函数的最优化也能够带来对目标函数的优化。另外,这里构造的新辅助函数仍然可以直接使用标准的Trust Region方法来解决,从而可以快速求取全局最优点。实验表明基于Bounded Trust Region的方法超越了传统的EBW算法和原始Trust Region方法。第三,本文还针对实际的LVCSR系统中存在的若干问题进行了探讨,包括处理海量训练语料时的计算能力问题和由此导致的效率瓶颈,以及区分性训练中普遍存在的推广性问题等。在此基础上,我们分别结合基于WFST解码器生成的具有优良品质的词图,和传统的基于HTK计算区分性训练相关统计量的工具,搭建了一套用于区分性训练的新流程。该流程相对于传统完全基于HTK流程的区分性训练,不仅在训练效率上得到了极大的优化,在识别性能上也有一定的提升。最后,本文在语音识别系统的重要模块之一——置信度判决CM方向进行了相关工作。我们首先基于语音识别系统的输出定义了所谓的“目标区域”和“非目标区域”,并分别针对不同的区域选择合适的置信度判决方法。我们尝试发掘“非目标区域”中的额外信息,以期对传统只基于“目标区域”进行CM计算的方法起到补充作用。实验结果表明,基于“非目标区域”的置信度对基于“目标区域”的置信度有很好的补充作用。接下来,我们又进一步利用贝叶斯信息准则对“非目标区域”中所吸收的语音边界进行定位,基于定位后的置信度取得了更多的性能提升。

【Abstract】 In past few decades, discriminative training (DT) has been a very active research area in automatic speech recognition (ASR). Discriminative training of acoustic model has become one of the most important training methods for state-of-the-art speech recogni-tion systems, especially for large vocabulary continuous speech recognition (LVCSR) systems. This thesis focuses on discriminative training of acoustic model and its appli-cation in LVCSR tasks. It also covers another important module in speech recognition, confidence measure (CM).Firstly, this thesis proposes a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture CDHMM in speech recognition. The CLS method is formulated under a general framework for optimiz-ing any discriminative objective functions including MMI, MCE, MPE/MWE, etc. In this method, discriminative training of HMM is first cast as a constrained optimiza-tion problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights.Secondly, based on the theoretical analysis of original Trust Region (TR) based optimization method we have proposed before, this thesis proposes a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. In original Trust Region method, the MMI based discriminative train-ing is treated as a standard trust region problem in optimization theory. And the global optimum of this problem can be obtained efficiently. However, optimizing the auxiliary function cannot guarantee increasing of original objective function. The proposed new auxiliary function still serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Due to its lower-bound property, the found optimal point is theoret-ically guaranteed to increase the original discriminative objective function. Further-more, the TR method can also be applied to find the globally optimal point of the new auxiliary function. The proposed bounded trust region methods have been investigated on several LVCSR tasks and experimental results show that the bounded TR method based on the new auxiliary function outperforms both the conventional EBW method and the original TR method based on the old auxiliary function.Thirdly, this thesis investigate several practical problems in LVCSR systems, e.g., computing ability and efficiency problems in discriminative training of HMMs in speech recognition, generalization problem in LVCSR system. We propose to build a novel procedure of discriminative training in LVCSR systems, by combining the word graph generated using WFST based decoder and calculating tools from HTK. When conducting discriminative training under this new procedure, not only the efficiency is significantly improved, we also achieve better recognition performance.Lastly, in this thesis, appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively comple-ment the traditional CM which usually focus on the target region. Furthermore, when analyzing the non-target region in a more theoretical way, where Bayesian information criterion (BIC) is employed to locate more precise boundary in the non-target region, even more improvement is achieved.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络