节点文献
欠定盲源分离算法及在语音处理中的应用研究
Underdetermined Blind Source Separation Algorithms and Their Applications in Speech Signal Processing
【作者】 白树忠;
【导师】 刘琚;
【作者基本信息】 山东大学 , 通信与信息系统, 2008, 博士
【摘要】 随着信息和计算机技术的发展,人们对信息处理方法的要求越来越高。很多实际应用中通过传感器获得的是一些有用信号的混叠信号或带噪声的混叠信号,如何从这些隐藏在混叠信号中的原始信号分离出来,是一些应用中必须解决的问题,盲源分离技术正是在这种背景下应运而生的。该技术一经提出,便引起了许多学者的广泛关注和重视。作为一种新的数据处理方法,盲源分离技术是人工神经网络、统计信号处理、信息理论、计算机学科等相结合的产物,在生物医学、医疗图像、语音信号处理、通信系统、信息检索等方面都具有非常重要的实际应用价值。盲源分离技术就是在信源信号和混叠过程均未知的情况下,仅根据少量的先验信息,从观测信号中恢复或估计出信源信号。这一先验信息就是盲源分离中的基本假设条件一信源信号之间是相互统计独立的,这是一个很宽松的条件,因此盲源分离技术在众多的领域中获得了广泛的应用,近年来成为现代信号处理领域中的一个新的研究热点。在早期的盲源分离研究中,一般都要做出观测信号的数目不少于源信号的数目的假设。然而,随着对盲信号处理研究的不断深入,作为常规模型的扩展,基于欠定模型的盲信号处理算法近年来得到了广泛的关注。这类算法主要解决源信号数目多于观测信号数目情况下的问题,更接近于盲源分离实际的情况。由于在此条件下系统是不可逆的,在研究的方法上与标准的盲源分离算法也有所不同,目前基于统计概率模型的过完备描述算法和基于稀疏特征的分离算法是研究欠定盲源分离的主要方法。本论文系统回顾了盲源分离技术的发展历史、研究现状和相关的经典算法,围绕欠定情况下盲源分离问题的一些关键技术进行了一些探索性的研究,包括欠定情况下源信号的可分离性、信源个数的估计、信源的稀疏性处理和欠定的非线性分离方法等,提出了一系列欠定盲源分离的算法,这些内容属于扩展的盲源分离问题,具有相当的理论意义和实际的应用价值。同时作为盲源分离技术的一个应用,在语音话者识别方面做了一定的研究工作。过完备描述算法作为一种扩展的盲源分离算法,在对信号特征刻画方面具有相当大的灵活性,不仅可以解决欠定盲源分离的问题,同时还可以得到描述信号高阶统计信息的过完备基函数。以此基函数为特征建立的基于文本无关的话者识别系统取得了很好的识别效果。本论文的主要成果概括如下:1.基于统计概率模型的过完备描述算法,该算法分为两步进行,首先在混叠矩阵固定的情况下对信源进行估计,然后在信源固定的情况下训练混叠矩阵。在具有二个观测信号或观测信号较少的情况下,提出了采用最短路径的方法来对信源进行估计,以提高训练速度,避免了求逆矩阵带来的运算量。2.在采用两步分离的欠定盲源分离算法中,混叠矩阵的精确估计是实现分离的前提,对信源作稀疏处理后,采用一新的加权势函数,在不增加运算量的前提下,低分辨率的情况即可以准确的估计出信源的个数,同时在聚类方向的邻域内采用高分辨率来精确的估计混叠矩阵。实验表明该算法可给出精确的估计,即使在观测信号中含有噪声的情况下,也可以保证很好的估计性能。3.在欠定盲源分离的分离方法上,传统的基于最小化L~1范数分离算法可给出确定的解,然而由于该算法是在最大化后验概率的情况下得到的,采用线性规划算法在信源稀疏性较差的情况下并不能给出理论上的最佳分离值。鉴于L~1范数分离算法的缺点,提出了两种改进的方法。第一,提出了基于在聚类方向加权的欠定盲源分离算法,该算法在寻找最佳分离矩阵时能较好的反映源信号的变化情况,在信源稀疏性较差的情况下较传统L~1范数分离算法能给出更高的信源分离信噪比,特别在信源中含有幅度较小的信源时,可避免由于L~1范数分离算法造成的分离不出小信号的情况。第二,提出了基于最小均方误差的欠定盲源分离算法,该算法在寻找最佳分离矩阵时能较好的跟踪源信号的内在变化,特别在信源稀疏性较差的情况下较传统L~1范数分离算法能给出更高的信源分离信噪比,通过对语音信号的分离并与原始语音信号对比试听,在语音的连续性和噪声方面都能达到满意的效果,具有实际应用的价值。4.含噪欠定盲源分离算法。由于噪声的影响将使算法的性能下降,然而在分离算法上实现去噪是非常困难的,为此含噪盲源分离算法常采用时频变换的技术,在进行分离之前先对观测信号进行滤波处理,然后考虑无噪声情况下的盲源分离。本文提出了一种新的基于高阶统计稀疏表征的欠定盲源分离算法,通过小波线性变换在变换域中对混叠信号进行分离,利用小波变换将信号能量“集中”在变换域中少数系数上的特征,既完成了对源信号的稀疏性处理,又可实现对噪声的消除,同时符合盲源分离的数学模型和先验假设,比基于二阶统计特征方法有更大的优越性,实验中取得了较好的分离效果。5.在话者识别应用方面。有效的特征信息一直是识别系统的关键,有效的话者特征同样是话者识别系统的关键。对于语音信号而言,不但包含有与话者有关的物理信息,同时也包含有语义信息,研究表明这两方面的信息可以认为是相互独立的,这与盲源分离模型的假设相一致。为此我们采用盲源分离技术中的过完备描述算法,将信号表示成基函数的线性组合,算法中通过设置基函数系数的分布特性来实现信号的分解,若假定系数具有稀疏的特性,即设置系数的分布为超高斯分布,则分解得到的基函数可准确地描述信号的高阶统计结构信息。该特征可用于语音信号处理中的话者识别和语音识别系统中,通过话者大量语音的学习获得与话者对应的特征信息,实验表明该特征具有较好的识别性能。综上所述,本论文针对目前欠定情况下的盲源分离算法存在的问题,以概率统计、稀疏处理和时频变换等技术为核心,对欠定盲源分离算法进行了较为深入的研究,并与实际应用相结合,成功的应用于语音信号处理中。论文最后总结了该研究领域亟待解决的一些问题和下一步的研究重点,同时对该领域的发展趋势进行了展望。
【Abstract】 With the development of information and computer technique,there is a high demand for methods of signal processing.In many applications,only the mixing signals or the mixing signals with noise of the source signals can be obtained by the sensors,how to separate the original source signals from the mixing signals is the problem that must to be solved in some applications,the technique of blind source separation(BSS) is being developed under this circumstance.Many researchers have devoted themselves to this area when the technique was put forward.As a new method of data processing,BSS is the result of the combination of the artifical neural network,statistical signal processing,information theory and computer science.And will be of great value in a lot of applications such as biomedicine,medical image,speech signal processing,communication system, information retrieval and so on.The technique of BSS is to recover or estimate the original signals from several observed mixed signals according to less prior information without any knowledge of the sources and channels.The prior information is the basic hypothesis of the blind source separation,i.e.the sources in the mixing signals are independent each other.This condition is not strict so the technique of BSS has wide applicaions in many areas,and recently,it has been one of the most active research areas in the modern signal processnig.In the earlier research of the BSS,generally,to simplify the research work, BSS methods mostly concentrate on the overcomplete case or complete case,that is,the number of observed signals is not less than that of source signals.However, with the development of the blind signal processing,as the extension of the standard BSS,underdetermined BSS has attracted a great deal of attention recently, in this case,the number of observed signals is less than that of source signals, because the system is not linearly invertible,so the research method is different from that of the standard BSS,recently the main method of the underdetermined BSS is based on the overcomplete representation and the sparse representation.This dissertation reviews the development of BSS,the current research status, the related theory and classic algorithms systematically,lots of exploratory research work has been done around the key techniques of underdetermined BSS, which include the separability,the estimation of the number of sources,the handling of the signal sparseness and nonlinear separation method and so on,some separation algorithms has been proposed,these contents are belong to extended BSS,and has considerable theoretical value and practical value.Also,as the application of BSS,we have done some research work on speaker recognition。Overcomplete representation is an extended BSS algorithm,it has a great flexibility in capturing the structures of the signal,not only can solve the underdetermined BSS,but also can obtain the basis functions which describe the high order statistical information of the signal.Base on these basis functions,a text Independent speaker recognition system was built,experiments result shows that this kind of features can gives better recognition rate.The main contributions of this dissertation are consists of the following parts:1.In the algorithm of overcomplete representation,the algorithm can be divided into two steps,frist estimating the sources when the mixing matrix is fixed, then training the mixing matrix when the sources are fixed.In the case of two or not too many observed mixing signals,the shortest path decomposition is proposed to estimate the source signals in the frist step,which can enhance the training speed and can avoid the computation of the invertable matrix.2.In the two steps algorithm of underdetermined BSS,the accurate mixing matrix estimation is the precondition for separation,after sparse processing,a new weighted potential function has been used,in the lower resolution we can estimate the number of sources accurately,also in the neighborhood of the clustering directions,higher resolution is used to estimate the mixing matrix.Experiments results show that this method can give better estimation even if the observed signals are contaminated by noise.3.In the separation method of underdetermined case,the traditional method of minimization of L~1 norm can give the certain solutions,this algorithm was obtained at the maximum aposteriori probability(MAP) of source signals,and can not achieves the theoretical optimal value when using the linear programming method, especially when the sourse signal is not sparse enough.Two methods are proposed to overcome the disadvantage of the L~1 norm method.First,a weighted algorithm at the clustering directions is proposed,this method can described the sources better and can gives higher SNR than that of L~1 norm method,especially can avoid the separation failure when small amplitude source in observed signals.Second a least mean square error algorithm is proposed,this method can trace the inner variety of the sources when searching the optimal separation sub-matrix,also can gives higher SNR than that of L~1 norm method,by comparison with sources,the separating performance is satisfied and have the practical value.4.Algorithm of noisy underdetermined BSS.Noise can influence the algorithm performance,however,algorithm of BSS with denoising is difficult to implement,for this reason,time-frequency transform are usually used to denoise the observed signal before separation.An algorithm based on higher order statistical sparse feature is proposed,which utilizes the "concentration" feature of wavelet transform to denoise.After the wavelet transform,the information of signal is concentrate on less wavelet coefficients,this not only achieved sparse processing,but also achieved denoising in wavelet domain.Also this method is suitable for BSS model and BSS prior hypothesis,and has an advantage than that of method which based on second order statistical sparse feature,experiment result show better separation performance.5.The application in speaker recognition.Effective features are always the key problem in recognition systems,also in the speaker recognition system.For speech signal,not only include the physical informations of speaker,but also include the semantic information of speech,research results have shown that these two kinds of informations can be considered to be independent each other,this feature is very suitable to BSS model.We use the technique of the overcomplete representation to divided signal into linear combination of basis functions according to setting the distribution of the coefficients of the basis functions,if the coefficients are sparse or have the super-gaussian distribution,the obtained basis functions can describe the higher order statistical structure of the signal.These basis functions can be used in the speaker recognition system or speech recognition system,after training large numbers of speech signal,we can get the basis functions which is correlative to the speaker information,experiment result shown the feature is effective.In sum,around the probability statistical method,sparse handling,nonlinear separation and time frequency transform techniques,underdetermined BSS method are investigated in this dissertation,combined with the application,BSS technique has successfully used in speech signal processing.Finally,the problems to be solved related to this research area and future research topics are summarized,furthermore, the prospect of the developing tendency is analyzed as well.