节点文献

基于麦克风阵列的声源定位与语音增强方法研究

Study on Methods of Microphone Array Based Sound Source Localization and Speech Enhancement

【作者】 崔玮玮

【导师】 曹志刚;

【作者基本信息】 清华大学 , 信息与通信工程, 2009, 博士

【摘要】 在免提电话、视频会议等语音通信系统中,由于受到混响和背景噪声干扰,麦克风接收到的信号通常为带噪语音。这样不仅影响语音的可懂度,而且影响语音处理系统的整体性能。因此需要对带噪语音进行增强处理。在复杂的声学环境下,单麦克风语音增强已无法满足需求,而麦克风阵列处理技术能够捕捉声源位置并对带噪语音进行空间滤波,从而取得明显的消噪效果。在此背景下,本论文研究了基于麦克风阵列的声源定位和语音增强方法,主要工作如下:(1)归纳并总结了各种时延估计(time delay estimation, TDE)技术,特别针对一些常用的TDE方法进行了深入讨论,包括对定源和动源的跟踪能力,不同混响和信噪比条件下的抗干扰稳健性,以及算法的计算量。通过仿真结果总结出了它们各自的优缺点及适用场合。(2)提出一种双麦克风2D平面定位方法:该方法通过同时考虑阵列接收信号的时延和能量信息,将传统双步定位方法中所需的3个麦克风减为2个,降低了设备成本。在此基础上获得的闭式解方便了算法的快速处理。进而针对该定位模型,在测量噪声服从高斯分布的假设下,本论文推导出位置估计方差的Cramer-Rao下界,并由此分析了不同参数对定位结果的影响。(3)提出一种基于搜索空间预估计的高分辨方位(direction of arrival, DOA)估计方法:本论文利用TDE结果来获得高分辨DOA估计的搜索空间。这不仅使得计算量小于现有算法的1/3,而且还能够部分地去除干扰噪声的方向。在会议室环境下,实际定位系统(包含7个麦克风)的测试结果表明:在加入和未加入搜索空间预估计时,DOA估计的最大误差分别为4.4?和11.4?。(4)提出一种基于一阶差分麦克风(first-order di?erential microphone, FDM)阵列的谱域语音增强方法:该方法利用双通道的FDM阵列,并结合单通道的谱增强技术,可以同时提取语音和噪声谱估计,并实时地修正噪声谱。与现有的双通道语音增强技术相比,该算法可以获得2dB~6dB的输出信噪比增益,且计算量减少了2/3。

【Abstract】 In many speech communication systems, such as hands-free telephone and video-conference, the speech signal received by a microphone is often corrupted by the rever-berations and background noises. It not only a?ects the intelligibility of speech signals,but also degrades the overall performance of speech processing systems. Therefore,it is necessary to develop speech enhancement methods to suppress the interferencenoises. In diverse acoustical environments, speech enhancement from a single mi-crophone fails to meet the requirements. While, an alternative solution, referred toas microphone array processing techniques, can obtain a significant noise reduction bycapturing the location of a sound source and implementing the spatial filtering on noisysignals. Herein, this dissertation focuses on microphone array based sound source lo-calization and speech enhancement methods, and the contributions are as follows.(1) Summarized the di?erent kinds of time delay estimation (TDE) techniques.Specifically, the most popular TDE methods are studied on tracking ability ofstationary and moving sources, robustness under di?erent reverberation levelsand signal-to-noise ratios (SNR), as well as the computational complexity. Basedon simulation results, this dissertation presents the advantages and disadvantagesof these algorithms and their applications.(2) Proposed a dual-microphone based source localization method in 2D space. Bycombining the information of time delay and energy attenuation of the receivedsignals, the proposed method reduces the number of microphones for localiza-tion to 2. Compared with 3 microphones required in the conventional two-steplocalization methods, this work cuts o? the device cost. Besides, the closed formsolution obtained in this dissertation facilitates the algorithm’s implementationand procession. Furthermore, under the assumption of Gaussian measurementerror, the Cramer-Rao lower bound of the estimated position’s variance is derivedfor the proposed localization model, and the impacts of di?erent parameters on localizing accuracy are also analyzed.(3) Proposed a high resolution direction of arrival (DOA) estimation method basedon searching space pre-estimation. This work utilizes the TDE result to obtaina candidate searching space for the high resolution DOA estimation. It not onlyreduces the computational consumption to less than 1/3 of the existing methods,but also can partially eliminate the directions of interference noises. In a real-istic conference room, experiment results of the localization system, composedof 7 microphones, show that: with and without searching space pre-estimationprocessing, the maximal error of DOA estimate is 4.4? and 11.4?, respectively.(4) Proposed a first-order di?erential microphone (FDM) array based spectral do-main speech enhancement method. This method applies dual-microphone FDMarray, in combination with single-channel spectral enhancement techniques, thusit can obtain an estimation of speech spectrum and noise spectrum simultane-ously, while correcting the noise spectrum in real time. Compared with thepresent dual-channel speech enhancement techniques, this method can achieve2dB~6dB output SNR gain, and reduce the computational complexity by 2/3.

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2010年 02期
  • 【分类号】TN912.35
  • 【被引频次】18
  • 【下载频次】2164
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络