节点文献

基于匹配跟踪的低位率语音编码研究

Study on Matching Pursuit Based Low-bit Rate Speech Coding

【作者】 张文耀

【导师】 王裕国;

【作者基本信息】 中国科学院研究生院(软件研究所) , 计算机应用技术, 2002, 博士

【摘要】 语音编码技术在高速率和中速率上已经能够产生质量非常高的重构语音,但是低位率乃至极低位率的高质量语音编码仍然是一个具有前沿理论意义和潜在实际应用价值的挑战性研究课题,促使许多研究人员探索新的技术手段和方法,如新的正弦建模技术,新的参数量化方法等等,以期实现低位率高质量语音编码。本文正是沿着正弦建模正弦分析的方向,采用匹配跟踪技术,结合心理声学模型,研究了新的建模方法以及模型参数的量化编码,对低位率语音编码及相关问题进行了有益的探索,并取得了如下创新性研究成果: 1.运用匹配跟踪技术处理了语音信号增强问题,给出了匹配跟踪信号增强过程中相干比阈值的确定方法,实现了在未知信号与噪声统计特性的情况下,在相当大的范围内明显增强信号的目的。 2.研究了基于匹配跟踪的正弦建模问题,提出了动态掩蔽阈值、感知梯度等概念,以及感知梯度正弦建模算法。感知梯度正弦建模比较好地利用了心理声学模型,在建模过程中最大限度地增加合成信号的感知信息,提高了建模效率。即使在模型精度不高的情况下,该方法也能得到合成质量比较好的语音。 3.针对正弦模型参数的量化编码,提出了幅度参数矢量量化、频率参数差分量化等方法,并探讨了频率盒量化模型以及随机相位和零相位模型等。这些方法有效地降低了编码位率。 4.围绕编码位率的降低和语音质量的提高,以逐步求精层层递进的方式研究了一系列压缩编码方案,并最终提出一个位率在1.5~2.4kbps的综合编码方案。针对各种不同建模方法和参数量化技术,本文探讨了基于普通匹配跟踪正弦建模的压缩编码、感知梯度正弦建模压缩编码、基于动态字典匹配跟踪的压缩编码、分类动态字典压缩编码,以及结合感知梯度正弦建模和分类动态字典的综合编码方案。结果发现匹配跟踪正弦建模在低位率语音编码上具有很大潜力,为低位率高质量语音编码探索了一条新的技术路线。最后提出的综合编码方案比较多地考虑了心理声学因素,融合了分类处理、动态字典和感知梯度建模思想,在编码位率和合成语音质量上都比现有的一些国际编码方法和标准要好。 5.提出了CAMDF函数,以及基于CAMDF的语音分类与基音估计算法,并在本文的压缩编码方案中得以运用。由于CAMDF克服了传统AMDF函数的不足,新的基音检测算法不仅有效地降低了误判率,而且简化了基音检测过程,提高了估计值的精度。利用CAMDF的语音分类也取得了比较满意的结果。 最后,总结全文,分析了目前研究工作中有待进一步完善的地方,指出了下一阶段的研究方向以及对本领域的一些展望。

【Abstract】 The speech coding technology has achieved high quality of reconstructed speech at high-bit rate and medium-bit rate. For low-bit rate and even very-low-bit rate, however, to achieve high speech quality is still a challenge problem that has important significance in theory and potential application value in practice. This makes lots of researchers explore new methods and techniques for the goal, such as techniques for sinusoidal modeling and methods for parameter quantization, and so on. Following the direction of sinusoidal modeling and sinusoidal analysis, this thesis adopted the matching pursuit techniques along with the psychoacoustic model, explored some novel methods for sinusoidal modeling as well as the quantization of model parameters, and discussed the low bit rate speech coding and its related problems. The major contributions of this thesis are included in the following:1. The matching pursuit techniques are applied to enhance speech signal, and a method to determine the threshold of coherent ratio is provided in the enhancement procedure based on matching pursuit. With the method, the noisy signal can be efficiently enhanced in a rather wide range while the statistical property of signal and noise is unknown.2. The sinusoidal modeling based on matching pursuit is studied in this thesis, and the concepts of dynamic masking threshold and perceptual gradient are proposed as well as the algorithm of sinusoidal modeling with perceptual gradient. The newly proposed method makes good use of the psychoacoustic model. And the perceptual information contained in the synthesized signal is increased in a furthest way during the modeling procedure. Therefore the efficiency of modeling is improved. The quality of the synthesized speech by this approach is rather high even though the model precision is low.3. In order to encode the parameters of sinusoidal model, the vector quantization techniques for amplitude parameters and the differential quantization for frequency parameters are proposed and discussed. At the same time, the frequency bin model, the random phase model and the zero phase model are also discussed. All of these reduce efficiently the coding bit rate.4. Aimed at the reduction of bit rate and the improvement of speech quality, a serial of speech coding schemes are studied in a gradual refinement way, and an integrated coding scheme at 1.5-2.4kbps is presented finally. With different modeling methods and quantization techniques, the speech compression schemes discussed in this thesis include: the compression based on general matching pursuit sinusoidal modeling, the compression based on sinusoidal modeling with perceptual gradient, the compression based on dynamic dictionary matching pursuit, the compression scheme using classified dynamic dictionaries, and the integrated compression scheme that combines the sinusoidal modeling with perceptual gradient and the classified dynamic dictionaries. From these schemes it can be seen that matching pursuit based sinusoidal modeling has great potential in low bit rate speech coding, and provides a new way to study this problem. The finally proposed compression scheme takes more psychoacoustic effects into consideration, and takes the advantage of classified process, dynamic dictionary and sinusoidal modeling with perceptual gradient. Both of its bit rate and speech quality are superior to some existing international coding schemes and standards.5. A function named CAMDF is proposed as well as the CAMDF-based algorithms for speech classification and pitch estimation. The algorithms are used for the coding schemes in this thesis. Because the CAMDF conquers the defect of traditional AMDF, the new pitch detection algorithm not only efficiently decreases the estimation errors, but also simplifies the detection process and improves the precision of estimated value. Speech classification using CAMDF also obtains satisfying results.Finally, the key points of the thesis are summarized, some improvements to be done in the

节点文献中: 

本文链接的文献网络图示:

本文的引文网络