节点文献

低速率语音编码算法研究

Research on Low Bit Rate Speech Coding Algorithm

【作者】 计哲

【导师】 唐昆;

【作者基本信息】 清华大学 , 信息与通信工程, 2011, 博士

【摘要】 低速率语音编码算法在现代通信系统中有着非常广泛的应用,超低速率下的语音压缩编码算法是目前语音信号处理领域的重要研究课题之一。正弦激励线性预测(Sinusoidal excitation linear prediction, SELP)编码算法采用基于线性预测的正弦混合激励技术,在2.4kbps及更低速率的语音压缩编码算法中具有非常优越的性能。论文的研究目的是在SELP模型的基础上,对语音编码算法中的关键技术进行分析和研究,设计实现150bps的超低速率语音压缩编码算法。论文首先提出了高效的特征参数量化算法。在线谱频率参数(Line spectralfrequency, LSF)的标量量化中,提出了基于动态规划的全局最优LSF差值量化算法,并采用多码本进一步提高参数的量化性能,该算法能够在每帧28bits达到LSF参数的透明量化。在对基音周期参数进行矢量量化时,利用人耳的听觉特性,提出了基于感觉加权的失真度量准则,提高了参数的量化性能,并设计了一种码字搜索的整型优化算法,降低了基音周期最优码字的误搜索概率。针对超低速率语音编码算法中,特征参数量化比特不足的问题,提出了利用参数间相关性的特征参数解码端恢复算法。首先提出基于隐马尔可夫模型(HiddenMarkov model, HMM)的能量参数恢复算法,根据LSF参数和子带清浊音(Unvoiced/Voiced, U/V)参数估计能量参数的变化轨迹。随后提出基于高斯混合模型(Gaussian Mixed Model, GMM)的U/V参数恢复算法,利用LSF参数和归一化能量参数,对U/V参数的概率分布特性进行估计,从而节省了参数量化所需的比特数。随后,从解码端角度考虑,提出了特征参数插值方式的改进算法,以提高清浊音过渡时声码器的合成语音自然度。为了提高声码器的抗连续丢包处理能力,提出基于分模式线性预测的丢包隐藏算法,改善了连续丢包情况下的合成语音质量。最后,综合上述研究成果,设计并实现了150bps SELP语音编码算法,合成语音的客观平均意见分(Mean Opinion Score, MOS)为2.424,判断韵字测试(Diagnostic rhyme test, DRT)的准确率达到82.9%,码本存储量为120Kword,算法延时为325ms,总体性能指标超出国家十一五专项项目的要求。

【Abstract】 The low bit rate speech coding algorithm is widely used in modern communicationsystem, and the ultra low bit rate speech compression coding is one of the mostsignificant research topics in speech signal processing area at present. Sinusoidalexcitation linear prediction (SELP) algorithm uses linear-prediction based sinusoidalmixed excitation technique, and has very outstanding performance among the speechcompression coding algorithms at the bit rate of2.4kbps or less. The research purposeof this dissertation is to analyze and research the essential techniques in speech coding,and design the150bps ultra low bit rate speech compression coding algorithm based onSELP model.The high-efficiency quantization methods of characteristic parameters areresearched first. In the scalar quantization of line spectral frequency (LSF), the globaloptimal difference quantization of LSF based on dynamic programming is proposed. Ituses multi-codebook to further improve the parameter’s quantization performance, andcan attain the transparent quantization of LSF at the rate of28bits/frame. In the vectorquantization of pitch parameter, the perceptual weighting distortion measure whichutilizes the auditory characteristics of human ears is proposed to improve thequantization performance of pitch, and the integer changed optimization technique isdeveloped to further reduce the search error rate of the optimal codeword for pitchparameter.In the ultra low bit rate speech coding, the bits assigned to each frame is severelyinadequate to quantize the characteristic parameters. In order to solve this problem, therecovery algorithm of characteristic parameters in the decoder is proposed based on thecorrelation between different parameters. First the energy is recovered based on thehidden Markov model (HMM). It utilizes the LSF and the sub-band unvoiced andvoiced (U/V) parameters to estimate the change of energy parameters. Then the U/Vrecovery algorithm is proposed based on the Gaussian mixed model (GMM), whichutilizes the LSF and the normalized energy to estimate the probability distribution ofU/V parameter, so as to save the bits assigned to quantizing it.From the consideration of the decoding end, the interpolation algorithm for the characteristic parameters is developed to improve the naturalness of synthesized speechin the transition period from unvoiced speech to voiced speech. In order to improvevocoder’s resistance to packet loss, mode-based linear prediction packet lossconcealment algorithm is propose, which can improve the synthesized speech qualityunder the existence of consecutive packet loss.Finally, integrating the research achievements mentioned above, the150bps SELPspeech coding algorithm is designed and realized. The vocoder’s mean opinion score(MOS) is2.424, the accurate rate of the diagnostic rhyme test (DRT) is82.9%, thecodebook size is120Kword, and the algorithm delay is325ms. To sum up, the entireperformance index of the150bps SELP vocoder exceeds the requirement of the nationalEleventh Five-Year major project.

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2012年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络