节点文献

基音周期检测算法研究及在语音合成中的应用

Study of Speech Pitch Period Detection Algorithm and the Application in Speech Synthesis System

【作者】 李娟

【导师】 张雪英;

【作者基本信息】 太原理工大学 , 信号与信息处理, 2008, 硕士

【摘要】 语音信号的基音周期是描述激励源的重要特征参数之一,准确的检测语音信号的基音周期对高质量的语音分析与合成、语音压缩编码、语音识别等都具有重要意义。本文讨论了几种常用的基音周期检测方法以及小波变换和Hilbert-Huang变换,提出了抗噪性很好的自相关能量函数和幅度差能量函数相结合的基音周期检测算法,并将Hilbert-Huang变换应用于TD-PSOLA语音合成系统的基音标记中。文中首先介绍了几种常见的语音基音周期检测方法如自相关函数法(ACF)、平均幅度差法(AMDF)、倒谱法。自相关函数方法适合于噪声环境下,但单独使用经常发生基频估计结果为其实际基音频率的二次倍频或二次分频的情况;平均幅度差法、倒谱法在静音环境下或噪声较小时可以取得较好的检测结果,但在语音环境较恶劣、信噪比较低时,检测的结果下降较快,难以让人满意。基于此,本文提出了一种抗噪性很好的自相关能量函数(ACEF)和幅度差能量函数(MDEF)相结合的基音周期检测算法,抑制了自相关函数不必要的峰值,提高了抗噪性,有效弥补了传统基音周期检测算法的缺点。论文介绍了小波变换理论,包括连续小波变换、离散小波变换、多分辨率分析、Mallat算法等,并通过实验分析了基于Mallat算法的基音周期检测方法—小波分解与重构算法(高频置零)以及在Mallat算法基础上衍生出的多孔算法。直接用Mallat算法分解语音信号时,需要降采样,每一级分解后的分量长度是上一级分解分量长度的一半;而采用多孔算法时是直接对滤波器系数插值,每一级分解后的分量长度都与原信号的长度相等,有利于基音周期的提取。论文介绍了Hilbert—Huang变换理论,并将它应用于基音周期检测中。与传统方法相比,Hilbert-Huang变换不需要对语音信号进行短时平稳假设,检测精度高,适应范围广,帧长大大增加;与小波变换相比,Hilbert—Huang变换依据信号本身的信息对信号进行分解,随信号本身变化而变化,表现了信号内含的真实物理信息,具有更好的自适应性和优越性。论文将Hilbert—Huang变换应用于TD-PSOLA语音合成系统基音标注中,大大拓展了Hilbert-Huang变换的应用范围,并以实验证明:通常使用的自相关方法只求得每帧语音信号的平均基音周期,然后对所求得的基音周期在帧内采用插值技术标注,准确性不高;而用Hilbert-Huang变换方法给语音信号做基音标注,基本检测出了一段语音信号的所有基音峰值点,体现出每帧内微小的周期变化,比通常使用的自相关方法准确性高。

【Abstract】 Pitch period of speech signal is a very important character parameter to describe the excitation source. Detecting the pitch period of speech signal accurately has very important significance for speech analysis and synthesis, speech compression and coding, speech recognition. The paper discusses several common methods for pitch period detection and wavelet transform, Hilbert-Huang transform, this paper proposes the algorithm of AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF) which has good performance in anti-noise, meanwhile applies the Hilbert-Huang transform to pitch synchronous mark of TD-PSOLA speech synthesis system.This paper first introduces some kinds of commonly used speech pitch period detection. For example AutoCorrection Function (ACF), Average Magnitude Difference Function (AMDF), cepstrum etc. ACF is suitable for noise environment, but it is possible to produce the situation that period estimating results is double or half times of the actual results, AMDF and cepstrum can receive good detection results under silence environment or less noisy environment, but the decline of the result is fast under bad environment or low SNR environment and the result is difficult to be satisfacted. Therefore, we proposed a method which has good anti-noise performance--AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF), It improves the anti-noise performance, compensates the shortcomings of traditional pitch period detection method effectively.Next, The paper introduces the wavelet transform theory, including continuous wavelet transform, discrete wavelet transform, multi-resolution analysis, Mallat algorithm, Etc. This paper proposed a method of pitch period detection based on Mallat algorithm—wavelet decomposition and reconstruction algorithm (high frequency set 0) and trous algorithm which is derivated from Mallat algorithm. Mallat algorithm decompose speech signal directly, it needs to drop sampling, the length of each level of decomposition component is half of the length of decomposition component of the last level, but the trous algorithm interpolates to the filter coefficients directly, the length of each level of decomposition component is equal to the length of the original signal, it is conducive to pitch period extraction.This paper introduces Hilbert-Huang transform and applies it in pitch period detection, Comparing with traditional methods, Hilbert-Huang transform doesn’t need to do assumption of short-term stationary for speech signal and has highly detection accuracy, widely application scope, The length of frame greatly increases. Comparing with wavelet transform, Hilbert-Huang transform decomposes signal according to signal’ own information, changes with signal itself, it reflect the real physical information of the signal and has a better adaptability and superiority.In paper. Hilbert-Huang transform is applied in pitch mark of TD-PSOLA speech synthesis system, it expands the application scope of Hilbert-Huang transform. The experiment shows: The commonly used methods only can achieve an average pitch period of each frame, and then mark the pitch period by interpolation technology, the accuracy is not high. Marking pitch period by Hilbert-Huang transform can detect almost all the pitch peaks, reflect small changes in the frame, it has highly accuracy than ACF.

  • 【分类号】TN912.3
  • 【被引频次】10
  • 【下载频次】649
节点文献中: 

本文链接的文献网络图示:

本文的引文网络