节点文献

非特定人鲁棒性语音识别中前端滤波器的研究

The Research of Front-end Filter for Speaker Independent Robust Speech Recognition

【作者】 黄丽霞

【导师】 张雪英;

【作者基本信息】 太原理工大学 , 电路与系统, 2011, 博士

【摘要】 非特定人语音识别在于净环境下识别性能良好,但在噪声情况下,其系统性能将会大大下降。不仅如此,其识别率还受到语音多变性的影响,使识别的难度加大。本文针对非特定人识别系统中的噪音鲁棒性和多变性鲁棒性问题,对在特征提取时起重要作用的前端滤波器进行研究。分别从听觉感知和语音信号本身这两个角度出发来设计滤波器,使得滤波器更符合人耳听觉特性,或更精确地分析待识别的语音信号。抗噪实验表明,随着滤波器性能的不断提高,对应提取特征的噪音鲁棒性逐渐提高,不仅如此,多变性鲁棒性的实验表明,滤波器性能的提高与多变性鲁棒性的提高是一致的。本文主要完成了如下工作:(1)在FIR滤波器设计的基础上,给出Laguerre滤波器设计的详细步骤,并用后者代替前者用于过零峰值幅度(Zero Crossing Peak Amplitude, ZCPA)特征的提取。给出频域法实现Laguerre滤波器提取ZCPA特征的详细过程。Laguerre滤波器具有FIR滤波器的线性相位和ⅡR滤波器的长时记忆性,弥补了FIR滤波器通阻带特性差的缺点。实验表明,精确设计每一通道的中心频率和带宽得到的Laguerre滤波器较FIR滤波器明显提高了噪音鲁棒性。(2)针对FIR, Laguerre滤波器带宽呈对称性分布,不符合人耳听觉特性这一缺点,设计实现了弯折滤波器组(Warped Filter Banks, WFBs),并将其应用于ZCPA特征提取。通过一阶全通函数中的弯折因子p控制滤波器中心频率和带宽的分布,从而得到非均匀的频带分布和非对称性的带宽分布。典型的弯折因子p=0.48,p=0.63分别对应Bark, ERB尺度滤波器。同FIR, Laguerre滤波器相比,WFBs不需要严格控制每一通道的中心频率和带宽,而是同时得到16个通道的频率响应。实验表明,非均匀分布的频带和非对称分布的带宽较均匀分布的频带和对称分布的带宽明显提高了识别率;同FIR, Laguerre滤波器相比,尽管WFBs设计简单,但满足非对称性带宽分布的特性,因此ERB尺度的WFBs识别率更高,其噪音鲁棒性更好。(3)从待识别的语音信号本身出发,依据数字信号处理理论设计出优化滤波器组(Optimized Filter Bank, OFB)模型,并简化得到自适应带宽滤波器组(Adaptive Bands Filter Bank, ABFB)模型。FIR, Laguerre以及WFBs均是在人耳听觉感知准则上建立的滤波器模型,而OFB的设计则创新性地以识别性能为基准,首次通过遗传算法将前端滤波器和后端识别系统结合为一个整体,形成一个闭环系统进行优化。实现表明,OFB模型较Bark尺度滤波器明显提高了识别率,但由于其个数较多,不利于应用。因此简化OFB模型后得到ABFB模型,实验表明后者识别率仍明显高于Bark尺度滤波器,甚至优于ERB尺度滤波器。因此FIR, Laguerre, WFBs, ABFB四种滤波器中,ABFB滤波器的噪音鲁棒性最好,这也表明从分析语音信号本身出发对滤波器设计的重要性。(4)滤波器通道的个数,对滤波器分析信号的精度也有一定的影响。FIR,Laguerre, WFBs以及ABFB滤波器都是采用16通道的带通滤波器和16个频率箱提取ZCPA。使用Gammatone(GT)滤波器提取ZCPA时,采用K通道带通滤波器,并设计相应数目的频率箱接收幅度信息。实验表明,18通道较其他通道数的GT滤波器识别效果更好。(5)将FIR, GT, Laguerre,以及WFBs滤波器应用于多变性语料库的非特定人识别中,实验表明,随着滤波器性能的完善,其多变性鲁棒性也逐渐提高;且同MFCC特征相比,ZCPA在支持向量机(Support Vector Machine, SVM)系统下较在隐马尔可夫模型(Hidden Markov Model, HMM)下具有更好的多变性鲁棒性。

【Abstract】 The speaker independent speech recognition performs well under clean environmental conditions. In noisy environment, however, the recognition rate drops dramatically. Moreover, the recognition accuracy is also affected by the speech variability, which certainly increases the recognition difficulty. Aiming at improving the robustness with respect to the noise and the speech variability, this thesis was mainly focused on the research of the front-end filter which played a significant role in the feature extraction process. The designing approach of the filter was based on perceptual criteria and the speech signal itself respectively, which ensured the filter more matchable with human hearing property or more elaborately analyzing the speech signal. The noise robustness experiments show that, with the improvement of the filter property, the corresponding feature is more robust. Furthermore, the improved filter performance and the increased variability robustness are consistent. The main contributions of this thesis are presented as follows.(1)Based on the FIR filter, the designing approach of Laguerre filter was described in details. The Laguerre filter was used instead of FIR in extracting ZCPA (Zero Crossing Peak Amplitude). The process of Laguerre filter in extracting ZCPA in frequency domain was illustrated carefully. The Laguerre filter not only had the FIR’s linear phase, but also had the long memories of IIR’s. It compensated for the poor stop-band and pass-band property in FIR. The experiments show that, the Laguerre filter, which was provided exactly each channel’s center frequency and bandwidth, is more robust compared to FIR.(2)The FIR and Laguerre filters had the symmetrical bandwidth, which did not fit for the human hearing property. In order to solve this problem, the WFBs (Warped Filter Banks) were completed, which were used for ZCPA extraction. The warped factorρin the first-order all-pass function controlled the center frequency and bandwidth distribution of filters. Thus the bands were nonuniform and each bandwidth was un symmetrical. The typicalρ=0.48 andρ=0.63 corresponded to the Bark-scale and ERB-scale separately. Compared to FIR and Laguerre, the WFBs required no exactly each band’s center frequency and bandwidth. It got the 16 channels frequency response simultaneously. The experiments show that, compared to uniform bands and symmetrical bandwidth, the nonuniform bands and unsymmetrical bandwidth improve the recognition rates significantly. Moreover, compared to FIR and Laguerre filters, although the WFBs have a simple design method, they have the unsymmetrical bandwidth. Therefore, the ERB-scale WFBs has the better recognition results and noise robustness.(3)For analyzing the speech signal itself, based on digital signal processing theory, the OFB (Optimize Filter Bank) was proposed. Then ABFB (Adaptive Bands Filter Bank) was represented. Although FIR, Laguerre and WFBs were filter models which were based on human hearing criteria, the OFB model innovatively used recognition performance for benchmark. This approach originally combined the front-end filter and back-end recognition system as a closed circuit for optimization by Genetic Algorithm. The experiments show that the OFB model outperforms Bark-scale filter. However, the OFB cannot be easily applied because the models are in large quantity. Therefore the ABFB model was built by simplifying the OFB model. The experiments show that ABFB’s performance is still better than Bark-scale filter. It is even better than ERB-scale filter. Among the FIR, Laguerre, WFBs and ABFB models, the ABFB has the best noise robust performance, which also demonstrates that speech signal itself is important for filter design.(4)The number of filter bands corresponds to how precise the analysis of signal. FIR, Laguerre, WFBs and ABFB filters adopted 16 bands and used 16 frequency bins for ZCPA extraction. When using Gammatone (GT) filter for extracting ZCPA,K channels were designed and the corresponding number frequency bins were used to accept amplitude information. The experiments show that 18-channel GT gets the better recognition results than any other channels of GT filter.(5)Applying FIR, GT, Laguerre and WFBs filters to the variability corpus in speaker independent recognition task, the experiments show that with the improvement of filter property, the variability robust has also been improved. Moreover, compared to MFCC, ZCPA is more robust with Support Vector Machine (SVM) than with Hidden Markov Model (HMM).

节点文献中: