节点文献

基于麦克风阵列的语音增强算法研究

On Speech Enhancement Based on Microphone Arrays

【作者】 林静然

【导师】 彭启琮;

【作者基本信息】 电子科技大学 , 信号与信息处理, 2007, 博士

【摘要】 和单个孤立麦克风相比,麦克风阵列在时频域的基础上增加了一个空间域,对来自空间不同方位的信号进行空时频联合处理。因此,它可以弥补单个孤立的麦克风在噪声处理、声源定位跟踪、语音提取分离等方面存在的不足,能够广泛应用于各种具有嘈杂背景的语音通信环境(如会场、多媒体教室、助听器,车载免提电话、战场等),以提高语音通信质量。麦克风阵列研究是阵列信号处理的新方向,具有广阔的市场应用前景。本论文结合阵列信号处理和语音信号处理的特点,研究了如何利用自适应波束形成技术进行语音增强。即自适应的形成一个波束指向目标声源,并且在干扰源的方位形成零点,达到语音增强的目的。其中,主要研究如何克服阵列模型误差(如声源定位误差,阵列拓扑误差,通道响应误差等)对波束性能的影响,即提高波束的鲁棒性。论文的第一章是引言,介绍了该领域的研究背景、研究现状和待解决的问题、以及本文的研究内容和创新点等等。第二章以球面波动方程为基础,建立了基于麦克风阵列的语音信号处理通用模型(the General Signal Model of Microphone Arrays,GSMMA)。和传统的阵列模型相比,GSMMA不再使用窄带和远场假设,将语音信号看作是宽带非平稳信号,并且考虑了各通道由于传播路径的不同引起的幅度衰减差异。传统的阵列模型可以看作是GSMMA的一个特例。第三章从降低目标声源定位误差出发,研究如何保证自适应波束形成算法的性能,提出了一种基于双重加权的宽带MUSIC声源定位算法(Doubly WeightedBroadband MUSIC,DWB-MUSIC)。DWB-MUSIC以宽带MUSIC算法为基础,在各频点利用子空间分解原理对声源进行定位。算法首先对各频点噪声子空间进行加权,降低单频点定位误差的方差。其次,再利用各频点的信噪比信息,对各频点的定位结果进行二次加权,得到最终的宽带声源定位结果。第四章提出了一种利用阵列旋转不变性的宽带盲波束算法(BroadbandDeterministic Blind Beamforming,B-DBBF),避开了声源定位问题,在阵列满足旋转不变性的假设下,进行语音增强。针对宽带波束各频点分离序列可能出现的通道互换和幅度模糊,提出了一种基于相邻频点分离序列相关性的通道重排方案,确保分离序列的频域一致性。另外,通过调整权矩阵的模,消除了幅度模糊,使分离的序列没有幅度失真。第五章研究如何处理一般的方向矢量(或导向矢量steering vector)误差,重点研究了基于对角加载的鲁棒自适应波束形成算法(robust adaptive beamforming,RABF)。本章解决了该类算法的关键问题,即如何选择对角加载因子。在引入一系列假设后,本章推导出了最优对角加载因子的近似解析解。和迭代求解的方法相比,该结果不但降低了运算量,还揭示了哪些因素可以影响最优加载因子,以及如何影响。在此基础上,对该算法进行了性能分析。第六章提出了一种基于联合最坏情况性能优化的RABF算法(Joint Worst-CaseRABF,JW-RABF)。针对语音信号非平稳以及算法处理的实时性要求,提出的JW-RABF算法具有对有限样本数效应(finite sample effect)和方向矢量误差的双重鲁棒性。该算法也属于对角加载类算法,和W-RABF不同,JW-RABF通过对目标函数以及限制条件进行联合最坏情况性能优化来确定最优加载因子。本章同样推导出了其最优加载因子的近似解析解。在此基础上,结合频率聚焦技术,提出了宽带JW-RABF算法。该算法在相应频带上形成一个宽带波束,满足波束图随频率的变化在最小二乘意义下最小,并且,还可以有效处理宽带相干干扰源。第七章对全文进行了总结,比较了提出的各算法的优缺点,分析了存在的不足,提出了相应的解决方案,并且对后期的研究工作进行了展望。

【Abstract】 The signals received by Microphone Arrays (MA) can be processed not only in time-and-frequency domain, but also in spatial domain, compared with those received by singular or isolated microphones. Consequently, MA possesses capabilities of strong interference suppression, speech sources localization, tracking and separation, etc. For this reason, it has been proposed as a promising solution to excellent quality of speech communication in such applications as teleconference, hands-free mobile telephone and hearing aids. MA will definitely replace conventional desktop and head microphones in near future.The topic of this paper is how to enhance the desired speech sources via adaptive beamforming (ABF) techniques in MA. Since the performance of ABF is very sensitive to steering vector mismatches (e.g., look direction errors, imperfect array calibration, source local scattering and wavefront distortions), the problem of how to improve the algorithm’s robustness is discussed in detial.In Section I, the background of this field is presented, including the theoretical architecture, the current researching status as well as the existing problems in this field. The organization and main contributions of this paper are presented also.In Section II, the General Signal Model of Microphone Arrays (GSMMA) is proposed, based on the spheric wavefront propagation equation. Compared with the conventional model, GSMMA no longer takes the narrowband assumption and the far field assumption for granted. Since there is no carrier in MA systems, speech should be treated as a very wide band signal. Moreover, since the speech sources may be very close to the array, i.e., within the near field of the array, the differences of the amplitude attenuation between adjacent channels should be taken into consideration. Obviously, the conventional model is a special case of GSMMA.In Section III, an approach of Doubly Weighted Broadband MUSIC algorithm (DWB-MUSIC) is presented, aiming at minimizing the variance of speech source localization errors. In this section, the traditonal 1-dimensional and narrowband MUSIC is extened to the multi-dimensional and broadband version first, making it suitable for MA systems. Then two sequential weighting operations are applied to improve the performance. (1) A weight matrix is imposed on the noise subspace when implementing MUSIC at each frequency bin. (2) Another SNR based weight vector is imposed on the original estimates at each bin, in order to calculate the final broadband source location. The former decreases the standard deviation of the narrowband estimates at each bin, which is usually caused by array perturbations. The latter weakens the negative effect of those estimates at low SNR bins on the final broadband estimation. The two weighting operations together improve the robustness of broadband speech source localization dramatically. A more precise localization will reduce the steering vector mismatch definitely.In Section IV, the robustness against source localization errors is ensured via a blind way and an approach of Broadband Deterministic Blind Beamforming (B-DBBF) is presented. It is based on the conventional narrowband DBBF algorithm via rotational invariance techniques. Utilizing the nonstationarity of broadband speech sources, the narrowband DBBF is extended to broadband scenarios and implemented in frequency domain. The nonstationarity of speech plays a key role in this extension. A special correlation-based channel rearranging (CR) operation is performed to cope with the problem of channel swap. Moreover, the problem of scale ambiguity is eliminated also by normalizing the norm of the weight matrices. As a result, the desired sources can be recovered without any scale distortion.In Section V, the methods with robustness against general steering vector mismatches is investigated. For the sake of simplicity, the robust adaptive beamforming (RABF) based on worst-case performance optimization (Worst-case RABF, W-RABF) is emphasized especially. It belongs to the class of diagonal loading approaches with the loading level determined based on worst-case performance optimization. A closed-form solution to the optimal loading is derived after some approximations. Besides reducing the computational complexity, it shows how different factors affect the optimal loading. Based on this solution, a performance analysis of the beamformer is carried out. As a consequence, approximated closed-form expressions of source-of-interest (SOI) power estimation and the output signal-to-interference-plus-noise ratio (SINR) are presented to predict its performance.In Section VI, a novel approach of RABF based on joint worst-case performance optimization (Joint Worst-case RABF, JW-RABF) is presented, aiming at robustness against both finite-sample effects and steering vector mismatches. JW-RABF is distinquished from W-RABF by taking the finite-sample effect into account and appling the worst-case performance optimization to not only the constraints, but also the objective of the constrained quadratic equation. Using the approximations similar to those in Section V, simple closed-form solutions to the optimal loading as well as the optimal weight vector are also presented. Compared with W-RABF, it achieves better robustness in the case of small sample data size. Moreover, combined with Frequency Focusing (FF) techniques, an approach of broadband JW-RABF is presented. Besides ensuring the frequency invariance of broadband beampattern, it possesses the capability of suppressing both correlated and uncorrelated interferences effectively.In Section VII, the work of this paper is summarized and a few important conlusions are drawn. At the end of this paper, a discssion about some possible future research work is also presented.

节点文献中: