

Study on Speech Enhancement Based on the RBF Networks

【作者】 郭利斌

【导师】 郭继昌;

【作者基本信息】 天津大学 , 电路与系统, 2006, 硕士

【摘要】 实际中,语音信号不可避免地受到各种噪声的干扰,噪声降低了语音质量和可懂度,还可能导致语音处理系统性能的急剧恶化,甚至使整个系统无法正常工作。为了消除噪声干扰,语音处理系统广泛采用语音增强技术来改善语音质量和可懂度,提高系统性能。因此,研究语音增强技术的研究具有重要的意义。本论文研究了基于径向基函数(Radial Basis Function,简称RBF)网络的语音增强算法,并重点介绍了在频域上基于双RBF网络的语音增强算法。论文给出了语音增强算法的基本原理、实现方法以及增强效果。主要工作包括:1在分析传统时域上神经网络语音增强方法的基础上,提出了一种改进的方法。这种方法能够减轻神经网络的负担并且减少训练时间。在Matlab软件平台实现算法仿真,仿真结果表明该方法能够有效地抑制噪声,大幅度地提升语音信噪比(Signal-Noise Rate,简称SNR)。在加各种噪声条件下,该算法具有增强效果好、适应信噪比范围大、方法简单等优点。2在频域上,利用两个训练好的RBF网络分别处理噪声语音的线性预测系数和共振峰参数,并利用这些参数修正语音的频谱包络,然后重建语音。该方法对语音信号的基音频率、频谱斜率、共振峰等语音特征的影响很小,因而能够较好的保留语音信号的频谱结构,使语音品质不致降低。实验结果证明,语音的听觉质量得到很大的改善。3采用Mel频率倒谱距离失真度(Mel Cepstrum Distance,简称MCD)测试语音增强效果。实验表明该方法比传统信噪比更多地反映了可懂度信息,能更准确地评价语音增强算法的好坏和有效范围,是更合理的语音增强算法的度量。

【Abstract】 In general, speech signals are inevitably corrupted by various noises. These noises degrade the quality and the intelligibility of speech signals, seriously the processing systems couldn’t work well. In order to minimize the effects of the noise on the performance of the processing systems, speech enhancement technology is applied in the various speech processing systems. Consequently the study of speech enhancement technology is very significant.This thesis discusses the speech enhancement technologies based on Radial Basis Function (RBF)networks, and focuses on the technologies based on double RBF networks in the frequency-domain. The fundamental and the implementation of the method and their improved forms are presented. Following is the main work of this thesis:1. By exploring the traditional methods, a new speech enhancement method based on RBF networks in the time-domain is proposed. This method can reduce the burden of the RBF networks and the training time efficiently. Simulation of the algorithm based on Matlab software is implemented. The results of the simulation prove that proposed method can effectively restrain noise and increase signal-noise rate (SNR). The experiment results indicate that the method can greatly improve the quality and the intelligibility of noisy speech, and have other advantages such as the widely applicable SNR range, less computation load.2. In the frequency-domain, double RBF networks are used to cut off the ingredient of noise. The first is used to train the formant coefficients and the second is used to train LPC coefficients. Then the modified spectrum envelope can be estimated by using these coefficients. At last the denoised signal can be reconstructed. The algorithm has it unique advantage. Particularly the method may maintain the preferable accurate of signal in speech waveform, and the speech is retained well, and the quality of speech signals have been improved obviously.3. Mel cepstrum distance (MCD) is suggested to evaluate the effect. Experiments show the method is more related with the intelligibility and outperforms the traditional SNR as it can offer more information regarding the applied conditions of enhancement approaches and their relative efficiency.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2007年 01期
  • 【分类号】TN912.35
  • 【被引频次】10
  • 【下载频次】181