节点文献

基于非线性理论的汉语语音编码技术研究

(Research on Technology of Chinese Speech Coding Based on Nonlinear Theory)

【作者】 覃爱娜

【导师】 桂卫华;

【作者基本信息】 中南大学 , 控制理论与控制工程, 2012, 博士

【摘要】 语音的数字化分析和处理是语音信号数字传输和数字存储的重要过程。随着语音通信技术的发展,高音质、低带宽等优点一直是人们追求的目标,语音压缩编码在实现这一目标的过程中担当着十分重要的角色。目前语音信号的分析与压缩编码都是采用线性理论和线性预测编码技术,而语音信号的产生系统是一个复杂的非线性时变系统,具有混沌性和分形特征,所以采用线性方法来对语音进行处理无法从根本上提高语音传输和存储的性能。因此,论文在深入研究了语音信号非线性特性的基础上,结合径向基神经网络(简称RBF神经网络,Radical Basis Function Network)构造了一个语音信号非线性预测模型,并基于该模型设计出一个非线性预测编码系统。论文主要研究工作和创新点如下:(1)语音信号的混沌性检测和分形特征在非线性理论的基础上,针对汉语语音音素非线性特征参数的求解算法进行了研究,提出采用Wolf算法计算出33个汉语语音音素的最大Lyapunov指数,所得结果证明了汉语语音信号具有混沌性。然后采用GP算法求解出33个汉语语音音素的关联维数,根据所得结果说明浊音信号的产生系统是低维系统,而部分清音的发音系统是高维系统。(2)语音信号的相空间重构及其参数确定对语音信号非线性预测的理论依据以及预测工具进行了分析,并研究相空间重构参数——延迟时间和嵌入维数的确定方法。针对C-C算法存在的局限性,采用结合自相关算法、虚假近邻法的方法分别求解出汉语语音音素的延迟时间和嵌入维数。针对实验中采样率的选择和语音源的问题,论文运用统计分析的方法进行了研究,所得结果表明计算出的延迟时间和嵌入维数对不同的采样率和语音源具有较强的鲁棒性。(3)基于RBF神经网络的汉语语音非线性预测模型将汉语语音音素的非线性特征参数与RBF神经网络分析方法相结合,提出根据所计算出的33个汉语语音音素的延迟时间及嵌入维数作为RBF神经网络模型中三层网络神经元个数,构造出一个基于RBF神经网络的汉语语音信号非线性预测模型,并将该预测模型与现有的ADPCM线性预测模型进行了性能比较,仿真结果表明非线性预测模型预测误差较小,说明所提出的非线性预测模型具有更好的预测性能。(4)基于小波变换的语音增强处理针对语音信号的预测编码性能在噪声环境下会迅速下降的问题,研究了基于小波变换的语音增强处理技术,着重对小波去噪算法中的阈值去噪法进行了研究。一方面,针对阈值去噪算法中的传统阂值的选取难以适应非平稳噪声的这一缺点,将MCRA算法应用于小波域计算其噪声方差,得到随实时变化的噪声估计,并利用谱平坦度自适应调整阈值;另一方面,针对传统的软硬阈值函数的不足,在Breiman提出的非负死区阈值函数的基础上进行了改进,设计出一种改进的阈值函数,并从连续性、单调性等方面进行分析,验证其合理性。(5)语音E-CENP编码系统的设计运用构造出的非线性预测模型,结合增强处理和CELP语音编码算法,设计了一个非线性预测编码系统——E-CENP。系统中,预处理部分加入了所提出的小波变换的语音增强处理,预测器部分采用了所设计的RBF神经网络的非线性预测模型。仿真结果表明:与CELP线性预测编码系统相比,该非线性预测编码系统具有编码语音质量高、鲁棒性好等优点。论文运用非线性的理论和方法,构造了一个E-CENP语音编码系统,与CELP编码系统相比,该编码系统编解码后恢复出的语音信号的音质比较高而且鲁棒性较好,说明所提出的非线性理论的研究方法适合于具有非线性特性的语音,为语音信号的处理技术提供了新的思路和新的方法。

【Abstract】 Speech digital analysis and treatment are important process of speech digital transmission and digital storage. With the development of speech communication technology, advantages of high quality and low bandwidth and so on have been pursuing by people. Speech coding plays a significant role in the process of achieving the goal.At present, the analysis and prediction of speech signal are all using linear theory and linear prediction technique, but the speech production system is complicated nonlinear and has chaotic property as well as fractal feature, so linear methods can’t fundamentally improve performance of the speech transmission and storage. Therefore, the nonlinear characteristic of Chinese speech are further studied, combined with Radical Basis Function Network(RBF Network for short), a nonlinear predictor is designed. Then a nonlinear predictive code system is designed based on the predictor. Main works and results are as follows:(1) Speech signal chaotic property detection and fractal featureBased on nonlinear theory, nonlinear characteristic parameters of Chinese speech phonemes are studied. The maximum Lyapunov components of33Chinese speech phonemes are solved by Wolf-algorithm. The results indicate Chinese speech has chaotic characteristics. Correlation dimensions of33Chinese speech phonemes are solved by GP-algorithm, the results show that the production system of voiced are low-dimensional system, and the production system of some unvoiced are high-dimensional system.(2) Phase space reconstruction of speech signalTheoretical basis of speech signal nonlinear prediction and prediction tools are analyzed, and methods of solving phase space reconstruction parameters containing delay time、embed dimension are further studied, which are firstly solved by C-C algorithm, according to the limitation of results, then combined with auto-correlation algorithm and FNN(False Neatest Neighbors) algorithm are solving respectively. According to select sample rate and speech source at experimentations, statistical method is used to study. The results show that sample rate and speech source have little influence on delay time and embed dimension.They have strong robustness.(3) Nonlinear predictor model based on RBF networkCombined with nonlinear characteristics of Chinese speech signal and Radical Basis Function (RBF) network analysis methods, The averages of the delay time and embedding dimension for33Chinese speech phonemes determine the neurons number of the three layers for RBF neural network model, nonlinear prediction model based on RBF network is designed. Compared with the ADPCM linear predictor, the simulation results indicate prediction error of nonlinear predictor based on RBF network is significantly decreased and has higher performance as well as prediction accuracy.(4) Speech enhanced treatment based on wavelet transformPredictive coding performance of speech signal may drop swiftly at noise circumstance, to be aimed at this problem, speech enhanced treatment technologies based on wavelet transform are studied. Designing threshold function in wavelet threshold de-noising algorithm is studied primarily. On one hand, in order to overcome the drawback of the traditional threshold selection difficult to adapt to the non-stationary noise in threshold denoising algorithm, this paper get noise estimated with real-time changes by applied the MCRA algorithm to the wavelet domain to calculate the noise variance and get adaptive adjustment threshold value by used of spectral flatness. On the other hand, An improved threshold function design on the basis of non-negative dead zone threshold function which not only has good continuity but also overcome the lack of the fixed deviation existence in the soft threshold function and considers the characteristics of the attenuation of the noise wavelet modulus values conform exponentially.(5) The design of speech E-CENP code systemBased on the nonlinear prediction model, CELP speech coding algorithm and enhanced treatment are applied to design a nonlinear predictive coding system——E-CENP whose pretreatment joined enhanced treatment. Linear predictor of CELP is replaced with the nonlinear prediction model. The simulation results indicate:Compared with the linear predictive coding system, nonlinear predictive coding system has high quality、good robustness and so on.Based on theories of nonlinear dynamics, a nonlinear predictive coding system——E-CENP is designed. Compared with CELP coding system, the acoustics of the speech signal after by decoding is higher and has good robustness. The results show that the new methods and theories of nonlinear dynamics are adapt to speech,which provides a new idea and solution to the research of technique of speech processing.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2014年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络