节点文献
非线性滤波及其在说话人跟踪中的应用研究
Research on Nonlinear Filtering with Application to Speaker Tracking
【作者】 侯代文;
【导师】 殷福亮;
【作者基本信息】 大连理工大学 , 信号与信息处理, 2008, 博士
【摘要】 说话人定位是语音信号处理的重要内容之一,在语音增强、视频会议系统、人机交互、机器人等领域有广阔的应用前景。传统的说话人定位方法利用麦克风阵列在当前时刻接收到的语音信息进行定位,在自由声场条件下,能给出良好的定位效果。但是,在环境噪声与房间混响均存在的复杂声场条件下,该定位方法会由于虚声源的出现而错误地估计说话人位置。因此,需要采用声源跟踪的方法确定说话人位置,以提高说话人位置的估计精度。说话人跟踪是一种典型的非线性滤波问题。本文在贝叶斯估计框架下,以系统状态的后验概率密度函数为线索,对高斯和非高斯两类不同的非线性滤波方法,在滤波精度、鲁棒性和计算量等方面进行了改进。同时,将非线性滤波方法应用于说话人跟踪问题,提出了一些具有针对性的改进措施。本论文取得的主要创新成果如下:(1)在高斯分布条件下,提出了迭代的sigma点卡尔曼滤波(ISPKF)方法,该方法通过重复利用观测信息,提高了SPKF方法的估计精度。针对传统的迭代方法稳定性较差的问题,在非线性优化理论基础上,利用Levenberg-Marquardt方法调整预测协方差阵,保证了迭代滤波方法的全局收敛性。(2)传统的贝叶斯估计方法建立在H2准则基础上,以均方误差为代价函数,要求系统模型较为准确并且外部干扰信号的统计特性确切已知。但在实际应用中,不仅外部干扰信号的统计特性难以准确了解,而且系统模型本身也存在一定程度的不确定性。本文在H∞范数意义下,将统计线性化技术应用到鲁棒滤波系统,提出了H∞sigma点卡尔曼滤波方法(HSPKF)。该方法用sigma点转换技术减小了线性化误差,用H∞滤波方法提高了滤波系统对不确定性噪声的适应能力,从而增强了系统的鲁棒性。(3)在粒子滤波框架下,提出了基于均值漂移的拟蒙特卡洛滤波方法,该方法以确定性采样代替随机采样,利用拟蒙特卡洛积分中的低偏差序列代替随机采样点集合,使采样粒子在状态空间上均匀分布,最大程度地互相远离,从而降低了滤波过程中的积分误差,提高了状态估计精度;同时,用均值漂移技术调整采样粒子的空间位置,使采样粒子沿梯度方向向高似然区域移动,从而增加了滤波过程中有效采样粒子的个数,减少了所需采样粒子的数目,降低了计算需求。(4)针对重采样过程导致采样粒子多样性丧失、计算量增大的问题,本文提出了基于充分统计量的粒子滤波方法。对后验概率密度函数可以用充分统计量描述且充分统计量易于更新的情况,该方法通过充分统计量的传递代替后验概率密度函数的更新,这样,由于新的采样粒子从连续的而不是离散的分布函数中抽样获得,因而不会发生粒子退化现象,也不需要再进行重采样过程,从而降低了计算量。(5)根据说话人运动的特点,本文用多种模型描述说话人的运动状态,提出了基于采样交互的多模型粒子滤波方法。该方法在说话人跟踪过程中,通过调整粒子的采样区域来完成多模型方法中滤波器输入的交互过程,这不仅实现了对各滤波器中采样粒子数目的直接控制,避免了模型转换过程中的性能退化现象,而且摒弃了对各模型后验概率密度函数的高斯假定,使算法能适应任意的概率分布形式,增强了说话人跟踪系统的鲁棒性。(6)利用信息融合技术,提出了一种联合波达方向和时间延迟信息的说话人跟踪方法。考虑到波达方向和时间延迟两种观测信息对说话人位置估计精度的差异,该方法利用分层采样技术,将波达方向滤波器的状态估计结果,作为时间延迟跟踪方法的建议分布函数,这样就通过改善建议分布函数的质量,提高了粒子滤波器的采样效率,降低了说话人的跟踪误差。
【Abstract】 Speaker localization is one of the important techniques in acoustic signal processing, which have found many applications in fields such as speech enhancement, video-conferencing,human-computer interface,robot navigation,et al.Traditional approaches to speaker localization,which collect data from several microphones and exploit a frame of data merely at the current time to estimate the current source location,may causes wrong results since noise and reverberation cause spurious peaks to occur in the localization function.Therefore,it is necessary to track the location utilizing state space approach to improve localization accuracy.Speaker tracking is a typical nonlinear filtering problem.Taking the posterior probability density function as a clue,the nonlinear filtering methods,including Gaussian filters and non-Gaussian filters,are studied and improved in accuracy,robustness and computational complexity in this dissertation.Moreover,some improving measures are proposed purposely when applying the nonlinear filter to speaker tracking problem.The main contributions of this dissertation are as follows:(1) Under Gaussian assumption,an iterated sigma point Kalman filter(ISPKF) is proposed to improve estimation accuracy by using the new measurement iteratively.In the new method,the iteration is formulated as a nonlinear optimization process and a new update process is proposed based on the Levenberg-Marquardt algorithm,which insures convergence of the estimation and improves mean square estimation error.(2) Traditionally,Bayesian estimator has been based on the minimization of the H2 -norm of the corresponding estimation error.This type of estimator assumes the message model and the noise descriptions have known statistical properties.Unfortunately,accurate system models and statistical nature of the noise processes are not readily available.Here a H∞sigma point Kalman filter is presented under H∞performance criterion.Since sigma point transformation technique is used instead of Taylor series expansion,linearization error of the nonlinear system is decreased.Moreover,the noise uncertainty problem is solved utilizing H∞filtering method,which enhance robustness of the system.(3) In the framework of particle filtering,a mean shift quasi-Monte Carlo(MS-QMC) method is proposed in which low-discrepancy sequences are exploited instead of random draws according to a quasi-Monte Carlo integration rule.The idea of using deterministic points suggests that we can choose the points that provide the best-possible spread in the sample space and attain low integration error.Additionally,the mean shift technique is used to estimates the gradient of the approximated density and moves particles toward the modes of the posterior,leading to a more effective allocation of particles thereupon fewer particles are needed and the computational demand is reduced.(4) A sufficient statistics based particle filter is put forward to deal with sample impoverishment and large computational complexity problems in particle filtering.If the posterior density function depends on the observed data only through a set of sufficient statistics,which is straightforward to update,both of the problems mentioned above can be mitigated utilizing the proposed method by propagating the sufficient statistics instead of the posterior density function.As new samples are drawn from continuous density rather than discrete one,resampling is not required in the new filter,which results in a reduced complexity compared with the sequential importance sampling filter with resampling procedure.(5) A new interacting multiple model(IMM) particle filtering algorithm based on sampling interaction is developed to track a randomly moving speaker according to dynamic characteristics of the speaker.The interacting process in the new algorithm is accomplished by properly selecting the sampling region.Thus,not only the number of particles in each mode can be controlled so that the degeneracy problem around mode transition is avoided,but also the Gaussian assumption of posterior density function of the state is abandoned so that the filter can adapt to all distribution and the robustness of the speaker tracking system is enhanced.(6) Applying information fusion techniques to speaker tracking,both direction of arrival (DOA) and time difference of arrival(TDOA) of speech source are used to localize the speaker.Since the measurement modalities differ in the level of information they provide about the state,layered sampling method is employed in the proposed filter.Posterior density function of the DOA based filter is used as proposal distribution of the TDOA based filter,so that the new filter exploits the information in the most recent observation and guides the search in the state space effectively.Thus,the speaker localization error is decreased.
【Key words】 Nonlinear Filtering; Speaker Tracking; Microphone Array; Bayesian Estimation; Kalman Filter; Particle filter; Monte Carlo; Unscented Transformation;