节点文献
不匹配信道下耳语音说话人识别研究
Research on Whispered Speaker Identification in Channel Mismatch Conditions
【作者】 顾晓江;
【导师】 赵鹤鸣;
【作者基本信息】 苏州大学 , 信号与信息处理, 2011, 硕士
【摘要】 耳语音作为人类的一种辅助发音方式,在日常生活中起着较为广泛的作用,尤其是在金融领域,公安司法领域中各种身份的确认。说话者为了保证信息的私密性,常常会用到耳语音。正因如此,耳语音说话人识别也作为一个新的课题被提出来。耳语音主要是用在手机通话中,语音必然会受到信道畸变的影响。传统的识别模型遇到训练和测试的信道环境差异变大时,识别率就会大大受到影响。因此,必然需要一种稳健的信道补偿算法来增强这个说话人识别系统。为了解决这个问题,本文做了以下几个方面的工作:一、将各种信道的耳语音数据混合在一起训练通用背景模型(UBM),然后在此基础上进行最大后验概率(MAP)自适应获得说话人模型,将此模型和常规的GMM模型进行识别率的比较。实验证明,UBM模型优于普通的GMM。二、将联合因子分析(JFA)应用到耳语识别中,根据耳语数据库的特性,采取分开估计和省略残差空间的方法。具体在识别过程中,通过将训练所得的说话人因子和测试所得的信道因子相结合的方式,达到说话人不断适应测试信道环境的目的。实验结果显示修改后JFA的识别效果大大提升。另外,根据JFA在短时识别方面效果不理想,提出了一种在模型上保持说话人因子不变,而将信道因子用到特征方面,对每一帧特征矢量进行补偿的混合补偿法,该方法相对于JFA来说补偿的更为细致,实验显示HH信道训练时1s和2s平均识别率分别提高4.36%和3.89%,EP信道训练时1s和2s平均识别率分别提高4.14%和2.64%。三、根据支持向量机(SVM)的区分性,将说话人超向量输入到SVM中,结果系统性能不如UBM-MAP系统。这时将说话人因子矢量输入到SVM中,由于说话人因子在辨认系统中特征维数低,易线性可分,获得了良好的识别效果。然后经过三种信道补偿方法进一步去冗余,取得了和JFA相当的识别结果。
【Abstract】 The whispered speech is acted as an auxiliary way of communication and it is widely used in human life at the same time, especially in the all kinds of identity recognition of finance area and justice area. Speaker usually can use whispered speech in order to keep information secret.So, the whispered speaker identification is also noticed as a new project. The whispered speech is often used in mobile phone environment, which is affected by channel distortion. The traditional model gets low recognition accuracy when the channel environment difference between training and testing is obvious. Therefore, a robust channel compensation algorithm must enhance the speaker recognition system. In order to solve this problem, the article’s work is as follows:1. Mix all the kinds of channel whispered speech to train a universal background model (UBM), then on this base, maximum a posteriori adaptation is adopted to train the speaker model. Compare this model with GMM, the experiment result proves that the UBM performs better than normal GMM.2. Joint factor analysis (JFA) is introduced in whispered speaker identification. According the speech database’s characteristic, decoupled estimation and omitting residual subspace are applied. In the specific identification process, the speaker factor from training utterance and channel factor from testing utterance are combined to fit the test channel dynamically. The experiment shows that improvement JFA achieves high recognition result. In addition, JFA is not ideal in the short-time identification. A new hybrid compensation method which keeps speaker factor in model domain and applies channel factor in feature domain is proposed. This method is to compensate each frame feature vector and more meticulous than JFA. The experiment shows 1s and 2s average identification rate separately improve 4.36% and 3.89% when HH channel is trained. In addition, EP channel separately improve 4.14% and 2.64%.3. According to support vector machine (SVM)’s discriminability, the speaker supervector is input into the SVM. But the system performance is not as good as UBM-MAP. Then the speaker factor vector is input into the SVM. Because the speaker factor has the property of low dimension and linear discriminant availability, it achieves excellent accuracy result. After that, three kinds of channel compensation technique are used to improve the system’s robustness further and obtain quite identification result compared to JFA.
【Key words】 whispered speech; speaker identification; joint factor analysis; hybrid compensation; support vector machine;