节点文献

基于瞬时频率估计的耳语音说话人识别研究

Whsipered Speaker Identification Research Based on Instantaneous Frequency Estimation

【作者】 王敏

【导师】 赵鹤鸣;

【作者基本信息】 苏州大学 , 信号与信息处理, 2010, 硕士

【摘要】 耳语音作为人类的一种特殊发音方式,在语音学和生理学上都有别于正常音。随着社会经济生活的发展,耳语音在很多场合下得到了应用,在金融通信、公安司法、身份安全认证等领域中发挥着越来越重要的作用。耳语音在说话人识别的实际应用中,可以作为正常音的一种补充,完善说话人识别系统的性能。耳语音自身的特点决定了其识别的难度大于正常音,且易遭受信道的干扰,传统的语音参数在耳语音应用中稳健性较差,因此研究一种有效的耳语音参数用于说话人识别系统是一个亟待解决的问题。另外,考虑到当一个正常音训练的说话人系统用耳语音识别时,系统的性能表现会急速下降。那么在无法获得充分耳语音训练数据的前提下,如何提高耳语音说话人识别的准确率也值得探讨。针对以上问题,本文做了以下几个方面的工作。一、针对语音产生中的非线性现象,根据语音产生的共振峰调制理论,介绍了语音产生的调幅-调频模型(AM-FM Model),详细讨论了基于此模型的Teager能量算子和能量分离算法(DESA)在语音中的应用,并和其他具有类似功能的算法做了比较。二、根据多成分AM-FM信号侦测的多带解调分析(MDA)理论和能量分离算法,获得语音信号的瞬时幅度和频率。通过两者的加权估计得到了一种语音特征参数—瞬时频率估计(IFE),该参数可以描绘语音的精细频率结构。将该特征用于耳语话者识别并和传统的Mel倒谱系数(MFCC)进行了比较。实验结果表明,随着测试人数的增加和信道变化,新特征参数具有更好的识别率和稳健性。三、为了改善正常音训练的说话人系统中,用耳语音测试造成的系统性能急速下降的情况。本文将耳语音和正常音假设成两种不同的信道,在通用背景模型的基础上,对语音参数做特征映射后再进行训练和识别,以减少信道的影响。实验结果表明,加入特征映射后系统的识别率得到提高,并且和传统的MFCC参数相比,IFE参数的识别率和稳健性都有提高。

【Abstract】 Whispered speech, a special phonation mode different from normal speech in phonetics and physiology, has existed in human daily life for long time. With the ever-increased economic and technology progress in the society, whispered speech has been become a more important rule and applied widely in many circumstance such as finance service, public security and identity identification.Under the practical use in speaker identification, whispered speech could be considered as a supplement to the normal speech to improve the performance of speaker identification system. Because whispered speech is vulnerable to the interference from communication channel and low recognition accuracy due to itself character, traditional speech parameter has worse robust performance in whispered speech application. It is necessary to study and develop an effective character representation of whispered speech in speaker identification. In an addition to this problem, as a speaker system trained mainly by normal speech, the performance of system declines sharply as tested with whispered speech. Therefore, how to improve speaker identification accuracy under the condition of sparse whispered speech data is a valuable problem. The contribution of this paper to whispered speech speaker identification are as follow.1. Based on the non-linear phenomenon in speech and formant demodulation theory of speech production, this paper introduce AM-FM model of speech production particularly. A energy operator called Teager energy operator and discrete energy separation algorithm (DESA) are introduced in speech application. Meanwhile, a comparison between the energy separation algorithm and other algorithm which has similar function is presented.2. According to multiband demodulation analysis (MDA) in mixed components signal detection, the instantaneous amplitude and frequency of speech signal are extracted by DESA. A kind of speech parameter called instantaneous frequency estimation (IFE) are extracted by the weighted estimation both on amplitude and frequency to represent the accurate frequency structure of speech. The proposed speech parameters have been applied to whispered speaker identification and compared with conventional MFCC. The experiment results show that, as the test objectives increase, the IFE parameters perform as well as MFCC, even a little better. When the test channels are changed, comparing with MFCC, IFE effectively improves the robust performance of system.3. The performance of speaker identification system, trained mainly with neutral voices, declines sharply when tested with whispered speech. In order to change this phenomenon, on the condition that whispered speech and normal speech come from different channels, feature mapping is used to reduce the effects of channels before training and testing speaker system based on the universal background model (UBM). The experiment results show that, feature mapping improves the accuracy of system, and compared with MFCC, IFE provides better robustness and accuracy results than MFCC.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2011年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络