节点文献

基于麦克风阵列的语音增强与识别研究

A Study of Speech Enhancement and Recognition Based on Microphone Array Processing

【作者】 李晓雪

【导师】 徐文;

【作者基本信息】 浙江大学 , 信息与通信工程, 2010, 硕士

【摘要】 自动语音识别技术对纯净的语音信号已经可以达到较高的识别精度。然而实际工作环境中环境噪声与混响的存在,以及其他声源的干扰,造成待识别语音特征与训练模板之间的失配,使得系统识别性能急剧下降。本论文针对以小尺寸麦克风阵为接收端的自动语音识别系统,研究若干宽带语音阵列处理方法,通过空时联合处理提高实际工作环境下语音信号被正确识别的概率。论文关于语音信号声源定位的研究,采用了基于旋转不变技术的信号参数估计(ESPRIT)算法的宽带到达方向角估计方法,并结合多通道语音线性预测分析和信噪比估计对算法进行了改进。实验证明,这种高分辨宽带信号处理方法应用在小尺寸麦克风阵接收的语音信号上,具有远优于常规波束形成方法的性能,且避免了其他典型高分辨方法中对整个角度域的扫描计算。定位结果用于指引后续阵列处理以提取从特定说话人方向到达信号。大多数现有麦克风阵语音识别系统包括阵列信号处理和特征识别两个先后独立的阶段。论文将阵信号处理和特征识别统一起来考虑,识别系统的输出被反馈至前端的麦克风阵列,结合识别过程调节滤波器系数,最大化似然概率的输出,滤波器系数调节中并采用全局搜索算法进一步改善联合优化方案的性能。与常规阵处理方法增强语音波形质量不同,论文研究增强语音特征使其与识别模型更为匹配,直接提高识别过程中正确假设的似然概率。实验证明,采用联合优化方案训练滤波器系数,系统的识别性能得到明显提高。

【Abstract】 Automatic speech recognition (ASR) techniques have already been capable of achieving quite high recognition rates for clean speech. Under practical application environments, however, existence of environmental noises and reverberations, accompanied by interferences from other sound sources, can cause mismatch between the speech features to be recognized and the training templates, and thus severely degrades the performance of the recognition system. This thesis concerns development of array processing methods for wideband speech signals in the context of an ASR system with a small-sized microphone array in the front end. The goal is to, through joint spatial-temporal processing, increase the probability of correct speech recognition in practical environments.On speech source localization, a wide-band direction-of-arrival (DOA) estimation method based on the ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques) algorithm is developed, and further improved via a combination with multi-channel linear prediction analysis of speech signals as well as SNR estimation. Experiments with a small-size microphone array confirm that this method can achieve a very high spatial resolution for wide-band speech signals, far more superior to conventional beamforming methods, yet without beam-scanning across the entire angular domain required by other typical high-resolution methods. Source localization results are then used to guide the subsequent array processing to extract speech signals from the specified speaker.Most of the current microphone array ASR systems comprises two independent stages-array signal processing and feature recognition. This thesis considers the processing in those two stages in a joint way:outputs of the recognition stage are fed back to the front end; array filtering coefficients are then adjusted via an optimization procedure in which the likelihood of the right transcription is maximized for a selected vocabulary. In addition, a global searching algorithm is applied to further improve the performance of this joint optimization scheme. Different from conventional array processing aiming to enhancing signal waveform, the approach here enhances speech features to better match the recognition model, thus directly increasing the likelihood probability of correct hypotheses in recognition. Experiments clearly demonstrate the performance improvement of the proposed approach.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络