节点文献

基于麦克风阵列的声源定位算法研究

Research on Speech Source Localization Methods Based on Microphone Arrays

【作者】 居太亮

【导师】 彭启琮;

【作者基本信息】 电子科技大学 , 信号与信息处理, 2006, 博士

【摘要】 麦克风阵列已广泛应用于音/视频会议、语音识别及增强等领域。声源定位是阵列信号处理的主要任务之一,是实现空间滤波的基础。基于阵列的定位算法分为超分辨算法和非超分辨算法。非超分辨类算法的定位精度受到阵列孔径的限制,只能用于定位精度要求较低的情况。超分辨类算法定位精度可以突破瑞利限,在一定条件下可以实现任意定位精度,具有极大的应用价值。传统的超分辨算法假设信源为窄带远场平稳信号,而麦克风阵列处理主要针对宽带短时平稳的语音信号,且声源可能位于阵列的近场,这导致DOA(direction of arrival)估计算法不能通用。基于麦克风阵列的声源多维定位与传统的信源定位相比,主要存在如下问题: 1) 宽带信号:在窄带条件下,阵元之间的相位差可以近似认为是信号源位置的函数,频率为一常量;而语音信号为宽带非调制信号,阵元之间的相位差为频率和信号源位置的复合函数。 2) 近场源信号:在麦克风阵列处理中,因为应用环境不同,声源可能位于阵列的近场或远场,而传统的阵列信号处理均假设信源位于阵列的远场。 3) 空间干扰源:在室内环境中,空间干扰源和语音信号同时辐射到阵列上,严重影响定位性能。 4) 多维定位:麦克风阵列应用一般需要二维/三维定位,传统的阵列处理算法主要针对一维DOA估计。 本文围绕这些问题,提出了几种声源定位算法,实现了声源多维定位,主要工作如下: 1) 提出了基于麦克风阵列的近场信号模型:根据语音的传播特性和阵列处理的要求,提出了基于球面波前的近场信号模型,该模型综合考虑了阵元之间的幅度衰减和时延两个因素。当信源与阵列的距离较远时,阵元接收信号之间的幅度差异减小,该模型可以退化为远场信号模型。针对多维定位问题,提出了麦克风阵列的一般设计原则,并设计三种麦克风阵列:二维均匀圆环麦克风阵列、三维均匀直线麦克风阵列和三维均匀球面麦克风阵

【Abstract】 Microphone arrays (MA) are widely used in audio/video conferences, speech recognitions, and speech enhancements etc. The localization of the speech source is the primary task of the array signal processing, and the basis of spatial filter designing. The source localization strategies include the high (super)-resolution algorithms and the general resolution ones. To the general resolution method, the precision of localization is limited by the array aperture, and has only been used in low precision case. Used the high resolution algorithm, the precision of localization can get beyond the Rayleigh Resolution Limit decided by the array aperture, and can even gain arbitrary resolution at some case. Therefore, this type of methods is of great value. The classical high-resolution methods suppose that the signal sources are narrow-band and stationary in the far field. However, the speech signal is wide-band and short-time stationary in the near field. And then, the classical DOA (direction of arrival) estimation methods can’t solve the speech source localization problems. Speech source multi-dimension localization (MDL) methods based MA, vs. classical DOA methods, have some problems as follows:1) Wide-band signal: In the case of narrow-band signal, the phase-difference between two adjacent elements of array is supposed to be a function of source’s location, and the frequency of the signal is a constant. While, speech signal is wide-band and non-modulated, whose phase-difference is a compound function of the frequency and the location of sources.2) Near field source: In the MA processing, speech source is usually the near field of the array. While, the source lies in the far field of the array in the classical array processing.3) Spatial interference signal: The spatial interference and speech signal are captured simultaneously by the MA in the room environment, resulting in the bad performance of the speech source localization4) Multi-dimension localization: it requires two- or three-dimension localization in MA application, but only one-dimension localization in the classical array processing.Focusing on these problems, several algorithms of speech source localization are presented as follows:1) The near field signal model based on MA: To meet the array signal

  • 【分类号】TN912.3;TN64
  • 【被引频次】62
  • 【下载频次】2943
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络