节点文献
基于CS理论的语音增强算法的研究
The Study on Speech Enhancement Algorithm Based on CS Theory
【作者】 孙红英;
【导师】 杨鸿武;
【作者基本信息】 西北师范大学 , 电路与系统, 2012, 硕士
【摘要】 语音信号是非平稳、时变信号。通过语音传递信息是人类最重要、最常用的信息交换形式之一。语音信号处理应用极为广泛,主要的语音信号处理技术包括语音编码、语音合成、语音识别和语音增强等。通常,研究者们是在语音信号相对纯净的条件下,对信号进行各种处理。但现实生活中的语音不可避免地要受到周围环境的影响,存在各种各样的噪声。这些噪声会严重影响语音信号的质量与可懂度。本文研究了常用的语音增强算法,重点研究了经典的谱减算法和基于子空间的语音增强算法,概述了算法原理,讨论了其优缺点。在此基础上,提出了一种基于压缩感知(Compressed Sensing,CS)理论的语音增强方法。论文的主要工作与创新如下:1.提出了一种基于能零商的端点检测方法。该方法通过计算语音信号的短时能量与短时过零率的商,进行噪声检测。论文以初始语音段为噪声段,计算得到的能零商作为阈值,低于此阈值的语音帧被认为是非语音段,高于此阈值的语音帧为带噪语音段。通过去除非语音段,对信号进行端点检测。论文分别对纯净语音和信噪比(Signal-to-NoiseRate,SNR)SNR为0dB时受到火车噪声干扰的语音信号进行端点检测实验,计算两种条件下的语音信号的短时能量、短时过零率、能零积、短时谱熵和能零商,并对各参量进行归一化处理,实验结果表明,能零商在带噪环境下具有较好的鲁棒性。2.提出了一种基于压缩感知理论的语音增强算法,利用语音信号与干扰噪声信号在离散余弦变换下稀疏性的不同实现语音增强。该算法对带噪语音信号利用离散余弦变换(Discrete Cosine Transform,DCT)进行压缩,利用Hadamard矩阵对DCT系数进行测量,并利用改进的CS重构算法---双阈值正交匹配追踪(Orthogonal Matching Pursuit,OMP)算法重构语音信号,实现对重构语音信号的增强。当噪声中存在与语音特性类似的分量的时候,由于噪声分量和语音分量的相似度高,传统的OMP算法收敛速度慢。论文提出的算法主要根据残余信号与传感矩阵列向量之间的相似度,通过引入能量阈值对算法的迭代过程加以限制,以提高算法的效率和语音增强的效果。论文在不同信噪比条件下,对比了经典谱减法、子空间语音增强算法及本文提出的基于CS理论的语音增强算法对带噪语音信号进行增强的结果,采用主观语音质量评价方法ABX测试和平均意见得分MOS以及客观语音质量感知评价方法(Perceptual Evaluation of SpeechQuality,PESQ)对三种增强算法处理后的实验结果进行语音质量评价分析,结果表明,论文提出的语音增强算法优于其他两种语音增强算法。
【Abstract】 Speech signal is a kind of non-stationary and time-varying signal. Voice is the mostimportant and most common method for information exchange of human being. Speech signalprocessing, which mainly includes speech coding, speech synthesis, speech recognition andspeech enhancement, has a very wide range of application. Usually, the researchers conductvarious speech signals processing in relatively pure conditions. But speech signal alwayscontaminated by various noises in real life environment. The noise will seriously affect thequality and the intelligibility of speech signal. This thesis reviews the advantages anddisadvantages of speech enhancement algorithms by focusing on the classical spectralsubtraction algorithm and sub-space based speech enhancement algorithm. A novel speechenhancement algorithm is proposed by introducing the compressed sensing (CS) theory. Themain works and novelties are as follows:Firstly, this thesis proposes a novel speech endpoint detection method by calculating thequotient of short-time zero-crossing and short-time energy of speech signal. A threshold isobtained according to the quotient of the start segment of speech signal. Then each frame ofspeech signal is classified into non-speech segment or speech segment. The speechsegments are used for speech enhancement. We conduct an endpoint detection experimenton the pure speech signal and the noisy speech signal contaminated by train noise with0dB ofSignal-to-Noise Rate (SNR). The short-time energy, the short-time zero-crossing, theshort-time entropy of spectrum, the product of short-time energy and short-time zero-crossingand the quotient of short-time zero-crossing and short-time energy. Experimental resultsdemonstrate that the quotient of short-time zero-crossing and short-time energy is more robustfor endpoint detection on noisy speech.Secondly, the thesis proposes a novel approach for speech enhancement based oncompressed sensing theory according to the differences of sparcity between the noise signaland the speech signal in the discrete cosine transform (DCT) domain. We scarify each frameof noisy speech by the DCT. The partial Hadamard ensemble is employed as a sensing matrixto achieve compressive measurement of DCT coefficients. Speech signal is thenre-constructed with a modified orthogonal matching pursuit (OMP) algorithm by using twothresholds to realize the speech enhancement. The convergence should be slow on theoriginal OMP algorithm when noise signal contain speech like components. In the thesis,the iteration procedure of the OMP algorithm is controlled by introducing two energythresholds according to the similarity between the residual signal and the column vector of Hadamard measurement matrix to improve the efficiency of the algorithm and the result ofthe speech enhancement. Both objective and subjective experiments were employed tocompare the proposed approach with the sub-space method and the spectral subtractionmethod. Experimental results showed that proposed method outperforms other methods withthe highest PESQ, ABX and MOS score for Gaussian white noise and most kinds of colornoise.
【Key words】 Speech Enhancement; CS; Quotient of Energy and Zero-crossing Rate; DTOMP Algorithm; PESQ;