节点文献

汉语耳语音转换为正常语音的共振峰结构研究

Research on Formants Structure in Speech Reconstruction from Chinese Whispers

【作者】 刘建新

【导师】 赵鹤鸣;

【作者基本信息】 苏州大学 , 信号与信息处理, 2007, 硕士

【摘要】 耳语音是一种特殊的语音交流方式。耳语音转换为正常语音的研究在理论上具有重要的科学价值,同时也可应用于公众场合下的通讯、失音者的语音恢复和公安司法工作等实际应用方面。本文分析了正常语音和耳语音清浊音在短时能量、短时平均幅度、短时过零率、短时自相关以及短时平均幅度差函数等方面的异同,总结出耳语音与正常语音在时域上的差异。线性预测编码算法是提取语音共振峰最有效的方法之一,但是准确提取共振峰存在虚假峰和合并峰的问题。通过分析极点交叉现象以及处理措施,基于语音识别中共振峰的谱密度比其带宽更重要这一事实,本文提出一种算法,即基于极点交叉的LPC改进算法,通过修改共振峰极点半径以减小极点交叉引起的误差来准确提取共振峰。通过实验分析得知,这种算法不仅能够准确提取语音的共振峰参数,而且可以解决共振峰提取过程中存在的虚假峰和合并峰的问题,同时该算法对含噪语音共振峰的提取具有鲁棒性。通过分析共振峰轨迹跟踪曲线,得出了男声和女声的正常语音以及耳语音共振峰频率的平均值,得出耳语音和正常语音在共振峰结构上存在的差异。基于映射规则而言,高斯混合模型具有较好的性能和鲁棒性。本文在高斯混合模型的基础上建立起汉语耳语音LSF参数转换为正常语音LSF参数的映射模型,实现了汉语耳语音向正常语音的转换。

【Abstract】 Whispered speech is a special kind of speech communication.Reconstruction of normal speech from Chinese whispered speech have an important scientific value.On the other hand,it can be applied in several fields,such as the private speech communication in public,reconstruction of normal speech for the aphonic individuals,the special need for the forensic work,etc.In this dissertation,the differences between voiced sound and unvoice sound of normal speech and whispered speech are indicated,such as short-time energy,short-time average magenitude,short-time average zero-crossing rate,short-time autocorrelation function,short -time average magnitude difference function.The differences between normal speech and whispered speech are indicated in time domain.Linear Prediction Coding(LPC) is an efficient algorithm in extracting formant paremeters of speech,but it exists the questions of spurious peaks and merging peaks.It is well known that the formant spectral density is more important than the formant bandwidth.Based on the principle of pole interaction,a new LPC improved algorithm is proposed.It can extract the formants effectively by modifying the poles’radius.The experimental results show that the proposed method can extract the formants effectively,it can solve the two questions.At the same time,this algorithm is robust for noise speech. From the formants tracking curve,it can detain mean of speech formants.From them,the differences between normal speech and whispered speech are analyzed.Gaussian Mixture Model(GMM) is more robust and better performance than any other model based on mapped rules.Based on this model,a mapping rule is constructed by LSF from Chinese whispered speech to normal speech.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2008年 03期
  • 【分类号】TN912.3
  • 【被引频次】7
  • 【下载频次】225
节点文献中: 

本文链接的文献网络图示:

本文的引文网络