节点文献

带噪语音编码的若干问题研究

【作者】 李辉

【导师】 戴蓓蒨;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2007, 博士

【摘要】 随着移动通信技术的迅速发展和语音通信范围的不断扩大,在噪声环境下进行语音通信已经成为经常要遇到的情况,语音信号不可避免的要受到周围背景噪声的影响。对于参数编码方式,语音参数提取的准确与否以及对参数的量化编码方式都会对语音通信质量产生很大影响,因此研究从带噪语音中提取基音周期、提取描述声道的线性预测系数、有效的参数量化编码方法以及语音编码的抑制噪声方法具有非常重要的研究价值和实际应用前景。基音周期是语音编码中的一个重要激励源的参数,从实用化角度出发,提出了一种基于AMDF和ACF的计算复杂度低的快速基音周期参数的估计方法,通过对语音信号的AMDF值进行自相关运算,能够提高基音周期估计的准确率,经过对这一帧语音信号的AMDF值进行了变换,使一次自相关中的乘法运算变为只有一次加法的运算,由于只包含加减法和取绝对值运算,计算复杂度低,所以该算法可以广泛应用于需要实时基音周期估计的场合。还给出了一种适合于硬件电路实现的快速基音周期估计方法,并在一个FPGA(芯片型号为SpartanⅡXC2S30vq100-6)芯片上实现了语音信号的基音周期实时估计系统。目前还很少有适合采用硬件电路直接实现的基音周期估计算法,当需要实时提取语音信号的基音周期时,最好能够使用硬件电路实现实时的基音周期估计。针对语音信号的信噪比SNR比较低时,带噪语音信号的基音周期难以估计准确的问题,提出了一种基于GCI和小波变换的基音周期检测方法。采用小波变换直接从语音信号中检测出声门闭合时刻GCI的信号锐变点来提取基音周期,并且通过前置低通滤波器降低了噪声和共振峰的影响,用一级小波变换便可以获得了比较高的检测精度和噪声鲁棒性,同时降低了基音周期估计的计算复杂度。针对直接从带噪语音中难以准确提取线性预测系数的问题,给出了一种基于谱减的带噪语音的线性预测系数提取方法。由于背景噪声的能量和频率成分都是随时间发生变化的,采用了具有动态跟踪性能的最小值统计跟踪方法进行噪声功率谱估计,通过谱减方法得到干净语音信号的功率谱估计,然后再提取线性预测系数。实验结果表明,使用谱减的方法提高了提取线性预测系数的准确率。量化编码是参数编码中的重要技术,论文对几种常用的线谱频率参数矢量量化编码的方法进行了比较深入的探讨和研究,给出了一种基于高斯混合模型GMM新的量化编码方法。该方法的特点是其计算量和存储大小不随量化比特数的多少而改变。由于GMM量化器可以描述出参数空间分布的多种信息,因此可以采用非线性量化的设计方式,既提高了量化精度又减少了计算量和存储量。对于噪声污染比较严重的情况,通常采用在信号前端进行语音增强,论文提出了一种基于声道慢变特性的基于Kalman滤波的语音增强算法。该算法根据人们在发声时,声道的形状变化比较缓慢,声道系数也具有缓慢变化的特点,先将线性预测系数转化为线谱频率参数,然后对相邻帧的线谱频率参数做一阶平滑,修正了状态转移矩阵,抑制了增强语音中的孤立残留噪声。与传统的卡尔曼滤波语音增强算法和维纳滤波语音增强算法相比,基于声道慢变特性的Kalman滤波的语音增强算法,增强后的语音在分段信噪比和PESQ的评测结果上,都得到了进一步的提高。当语音信号的信噪比比较低时,采用论文提出了一种基于声道慢变特性的Kalman滤波的语音增强算法,作为语音编码的前端处理部分,提高了语音编码质量。论文的研究工作得到了国家自然科学基金项目(No.60272039)、教育部—微软重点实验室开放基金项目(No.06 120806)的支持。

【Abstract】 The mobile communication technology develops rapidly and the range of the speech communication is expended. The speech communication is often in the noise background and the speech signal will be corrupted. As to the parameter coding method, the speech parameters will greatly affect the quality of the speech coding. The study on extracting the pitch and the linear prediction coefficients in the noisy speech and the effective quantization coding methods and the noise reduction methods is very important for the research and applications.The pitch is the very important parameter of exciting source in the speech coding. A pitch detection algorithm based on AMDF AND ACF is proposed for the real-time applications. The computational expense of the algorithm is decreased. At first, AMDF values are computed by AMDF algorithm for a frame of speech signal. And then ACF values are computed by ACF algorithm for the AMDF values. In order to decreases computational expense and complexity, the AMDF values of the frame of speech signal are then transformed into one bit signals. The method can also decrease the effects of amplitude and formants the speech signal for pitch detection. The pitch period is calculated by ACF algorithm for the one bit signals. The multiplication operation for short-time autocorrelation function of the one bit signals is replaced by simple addition operation. A real-time pitch detector based on the field programmable logic arrays to meet the needs of the real-time pitch detection is proposed. The memories and gates and sequential circuits of Spartan II XC2S30 chip are used to implement these algorithms, which meets the needs of real-time pitch detector.The pitch of the noisy speech can not correctly be estimated when the SNR of the speech signal is low. A pitch detection method of noisy speech signals based on GCI and the discrete wavelet transform is proposed. The GCI position of the speech can be estimated by using the wavelet transform and then the pitch is calculated. The effects of the noisy signal and speech formants for pitch detection are decreased by the 3-order lowpass elliptic filter. The precision of pitch detection is increased and the algorithm decreases computational expense and complexity compared with the multi-scales wavelet transforms algorithm. It is difficult to extract the linear prediction coefficients from the noisy speech signal. A method of extracting the linear prediction coefficients from the noisy speech signal based on the spectral subtraction is proposed. The minimum statistics tracking method is used to evaluate the noise power spectrum because the energy and the frequency the noise are changed with the time. The speech signal power spectrum is extracted by using the spectrum subtraction and then the linear prediction coefficients are extracted. The experiments results show the method increases the corrective ratio of extracting the linear predictive coefficients.Quantization coding is very important for the parameter coding. The paper deeply studies the several normal methods of the vector quantization of the line spectrum frequency parameters. The method of the vector quantization based on Gaussian mixture models has computationally efficiency, low memory requirements, with its complexity independent on the rate of the system. The much information of the parameters spaces distribution can be described by the GMM quantizer. The computational expense and memory requirements are decreased and the quantization precise is increased by the nonlinear quantization method.The speech enhancement technology is used in the pre-processing section when the speech signal is seriously corrupted. A speech enhancement algorithm based on the spectral envelope and Kalman smoothing is proposed. According to the characteristics of the slow changes of the vocal tract parameters, the linear prediction coefficients are converted into the line spectrum frequency parameters and then these parameters of the current frame and previous frame are smoothed. The residual isolated noise is reduced. The quality of the enhanced speech is evaluated by means of segmental SNR and ITU-PESQ scores. Experimental results indicate that the proposed algorithm achieves obvious improvements compared with conventional Kalman smoother and Wiener filter algorithm.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络