节点文献

复杂环境下麦克风阵列语音增强方法研究

Study on Methods for Speech Enhancement Based on Microphone Array in Complex Environment

【作者】 张丽艳

【导师】 殷福亮;

【作者基本信息】 大连理工大学 , 信号与信息处理, 2009, 博士

【摘要】 语音增强是信号处理领域中主要的研究内容之一,在现代通信、多媒体技术、人机交互及智能系统等领域中具有广泛的应用价值。语音增强的主要目的是从带噪声的语音信号中提取出语音信息,以获得高质量语音信号。但多样性噪声源与环境混响的存在,使得麦克风接收的语音信号质量较差,这不仅影响语音的可懂度,而且影响语音处理系统的整体性能。因此,需要进行有效的噪声抑制,以增强语音信号的质量。通常情况下,单麦克风语音增强方法具有良好的噪声抑制性能,但在复杂的声学环境下,其噪声抑制性能急剧退化。麦克风阵列融合了语音信号的空间和时间信息,具有较高的空间分辨率与较强的抗干扰能力等特点,使得麦克风阵列成为视频会议等智能通信系统中捕捉说话人语音、改善语音质量的重要手段。近年来,基于麦克风阵列的语音增强方法已经成为语音增强技术的研究热点。本文以阵列处理和语音处理作为信号处理的主要工具,以视频会议系统为应用背景,对麦克风阵列语音增强方法进行了深入研究。本文的主要创新研究成果如下:(1)自适应波束形成与后置滤波波束形成融合的麦克风阵列语音增强方法。通常自适应波束形成语音增强方法适用于强相干噪声场,后置滤波波束形成方法适用于非相干噪声场,本文将这两种方法进行结合,给出了一种新的波束形成语音增强方法,该方法在相干噪声场与非相干噪声场环境下均有较好的消噪性能,因此对噪声场有良好的鲁棒性。(2)混响环境下时延估计方法的研究。在基于波束形成语音增强方法中,需要对麦克风接收的信号进行时延补偿。目前已有的时延估计算法大都没有考虑混响的影响,为此,本文给出了一种基于语音建立信号和广义相关加权的时延估计方法。该方法首先利用避免混响(Echo-Avoidance,EA)的混响模型(简称EA混响模型)来提取语音建立信号;然后用语音建立信号估计信号的功率谱,并进行平滑处理;最后采用广义相关加权方法估计时延。该方法在混响环境下可以有效地估计时延。实验结果验证了该方法的有效性。(3)倒谱域语音去混响方法的研究。本文给出了一种基于倒谱技术的麦克风阵列语音增强方法。该方法利用人耳对语音信号相位的不敏感特性,采用一种近似手段从含噪的语音信号中提取相位信息,以减少其运算量。仿真结果表明了该方法的有效性。(4)麦克风阵列语音增强子空间方法的研究。本文在麦克风阵列广义奇异值分解(GSVD)语音增强方法基础上,从降低计算量的角度,提出了一种基于GSVD的麦克风阵列语音增强改进方法。该方法是一种次优滤波语音增强方法,它在干扰噪声是白噪声的情况下,无需进行语音端点检测,因此计算复杂度大大降低。此外,本文还将GSVD麦克风阵列语音增强方法应用于单麦克风语音增强,同样取得了较好的增强效果。仿真实验结果表明,该方法能有效地抑制白噪声,使信噪比得到明显提高,同时也改善了语音质量。(5)基于语音生成模型的麦克风阵列语音增强方法研究。本文将单麦克风时变AR模型语音增强方法应用于麦克风阵列中,同时结合麦克风阵列的空间特性,给出了一种基于语音生成模型的麦克风阵列语音增强方法。该方法适合并行处理,可用较少的数据及AR模型阶数来实现语音增强处理。仿真实验验证了该方法的有效性。

【Abstract】 Speech enhancement is one of the key technologies for the fields such as the information highway, multimedia, office automatization, modern communication, intelligent sytem and so on. The main aim of speech enhancement is to pick up speech information from the speech signals with noise, in order to obtain high quality speech. But due to the existence of the noise diversity and environment reverberation, the speech quality received by microphone is not so good, which affects the speech intelligibility and the speech processing performance. So effective noise suppression is necessary to improve the speech signals quality.Generally, single microphone speech enhancement has good noise suppression performance, but under complex acoustic environment, its noise supression performance declines rapidly. Microphone array technique combines the space and time information of speech signals, and has flexible beam control, higher space resolution, higher signal gain and better anti-interferenc performance. Now microphonw array technique becomes very important methods for capturing speaker speech and improving the speech quality in the intelligent communication system such as video conference system. In recent years, the speech enhancement methods based on microphone array have gradually become the research hot pot of speech processing.This thesis adopts microphone array processing and adaptive processing as the main signals processing tools, video conference system as the application background, this thesis studies some microphone array speech enhancement methods. Moreover, considering the delay estimation for microphone array speech enhancement, this thesis also discusses the time delay estimation under reverberation environment.The main research results of this thesis are as follows:(1) Research on adaptive beamforming and postfiltering beamforming combined microphone array speech enhancement methods. Considering the advantages of adaptive beamforming method and postfilter beamforming method under different noise fields, this thesis combines these two methods to propose a new beamforming speech enhancement method. The proposed method has good noise cancellation performance under both the correlative noise field and non-correlative noise filed. So it has good robust performance to the different noise.(2) Research on time delay estimation methods under reverberation environment. For the beamforming speech enhancement methods, it is normally to compensate the different channel speech signals with time delay. However, most of the time delay estimation algorithms don’t take into account the reverberation influence. So this thesis proposes the time delay estimation method based on speech onset signals and generalized correlation weighting. This method first utilizes echo-avoidance (EA) reverberation model to pick up speech onset signals, then estimates the power spectrum with speech onset signals and carries out smooth processing, and at last adopts generalized correlation weighted method to estimate the time delay. This method can estimate the time delay accurately under reverberation environment. And the experiment result shows the validity of this method.(3) Research on the cepstrum based dereverberation methods. The speech dereverberation is also an important part of speech enhancement. This thesis proposes a microphone array speech enhancement. This method adopts an approximate method to gain the phase information from the noisy speech signals because the human ear is not sensitive to the speech phase. Compared wirh the traditional cepstrum speech enhancement methods, this method has less computational complex and can be used in the real video conference system which need to consider reverberation. Simulation shows the validity of this method.(4) Research on subspace methods for microphone array speech enhancement. In order to decrease the computational load, this thesis proposes the GSVD based microphone array speech enhancement method. This method is a suboptimal filtering speech enhancement, and it is not necessary to carry speech endpoint detection if the noise is white noise. Moreover, this thesis applies microphone array speech enhancement method to single microphone speech enhancement, and obtains good enhancement results. Simulation shows that the method can supress the noise effectively, and improve the signal to noise ratio.(5) Research on microphone array speech enhancement method based on the speeech production models. This thesis applies single microphone time-varying AR model speech enhancement methods to microphone array and combines the space characteristics of microphone array, then proposes the microphone array speech enhancement methods based on speech production models. This method can be paralell. In addition, this method uses less data points and AR model orders, and can realize real time speech enhancement. Simulation experiments show the validy of the method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络