节点文献

基于JFA的汉语耳语音说话人识别

Speaker Identification in Chinese Whispered Speech Based on Simplified Joint Factor Analysis

【作者】 王琰蕾

【导师】 赵鹤鸣;

【作者基本信息】 苏州大学 , 通信与信息系统, 2010, 硕士

【摘要】 耳语音说话人识别在公共场合下的通讯、安全场所的身份鉴定、罪犯识别、电话网络查询与电话银行等领域都有着一定的实用价值。它是一个较新的研究课题,有许多问题尚待解决。由于耳语发音方式的特殊性加上耳语通话常常在手机方式下进行,耳语音说话人识别受说话人发音状态、健康状况、心理因素及信道环境因素的影响变得更为突出。因此,用正常音建立的说话人识别系统对耳语音说话人识别基本不适用,识别性能将大为下降。目前已有的自适应补偿方法都将说话人变化和信道环境变化这两种因素混在一起,不加区分,这样的处理方式必然会影响耳语音说话人识别的识别效果。为此,有必要针对耳语音的特点,建立合适的识别模型来实现文本无关的耳语音说话人识别。本文提出采用联合因子分析(JFA)的方法来解决耳语发音时受多种因素影响说话人语音特征变异大的问题,该方法针对耳语音的特点引入了两类变化因子:说话人自身变化因子和通话信道环境变化因子。鉴于联合因子分析的难点,本文提出了一种适用于耳语音说话人识别的简化的联合因子分析方法,其最主要的特点是分开估计说话人空间和信道空间,因此在算法的复杂度和语音数据的需求量上都有很大的下降,从而大大降低了运算量和运算时间。本文建立了一种基于简化的JFA方法的识别模型,并且给出了相应的算法,在此基础上实现了耳语发音方式下与文本无关的说话人辨认。对本文提出的简化的JFA识别模型在8种不同的信道环境情况下分别进行测试,实验证明,该模型在信道失配的情况下也能有效地辨认耳语音说话人,并与已有的采用MAP、特征映射(Feature Mapping)和说话人模型合成(SMS)方法的GMM模型进行比较,识别正确率有了明显的提高。此外,还研究了说话人因子数和信道因子数对该识别模型性能的影响,实验发现,适当地增加说话人因子数和信道因子数有助于提高识别的正确率,但是两者均存在着一定的饱和问题,即继续增加说话人因子数和信道因子数对识别模型的性能几乎没有任何提高。

【Abstract】 Whispered speech is the mode of speech defined as speaking softly with no vibration of the vocal cords to avoid being overheard. The whispering speaker recognition can be applied in several fields, such as the private speech communication in public, the special need for the forensic work, etc.Since speaker recognition of whispered speech is in the early stage research, many models which are often used in normal speech are still used. However, most of them are not suitable for whispered speech because of its characteristics.At present, the available adaptive compensation methods make no distinction between the speaker health, psychological factors and the channel environment factors, which will definitely affect the recognition results of whispered speech.As to whispered speech, without the vibration of the vocal cords, it is always in low SNR. The locations, energy of the formants and the auditory model in whispered speech are different from those in normal speech. When whispering, the mentality of the enunciator is varied and susceptible. Hence, speaker recognition of whispered speech becomes more sophisticated compared to the normal speech. Concerns are how to decrease the influence of speaking environment, especially the variations of speech channels; and how to remove the mental or emotional affections.For the characteristics of whispered speech, this paper presents a new approach to speaker identification of Chinese whisperd speech which called simplified joint factor analysis. The main idea of the proposed technique decoupled estimates the speaker space and channel space, which removes the necessity of labeling databases for channel, simplifies the training procedure and also reduces the computation and the demanding of data sets.Experiments are carried on our own database. This corpus consists of 100 target speakers, 80 male and 20 female, in which each speaker is recorded over 8 typical channels. Compared with different recognition methods, such as MAP, Feature Mapping + MAP and SMS, the proposed JFA technique which we presented in this paper does provide superior performance and significant speedup in speaker identification of Chinese whispered speech. Especially, it does greatly improve the recognition accuracy when the enrollment and test conditions are mismatched.Studying on the number of speaker factors and channel factors shows that increase the number of the factors properly can improve the recognition accuracy effectively, but there is a problem called saturation. That is to say keeping on increasing the number of the factors can not improve the performance of the whispered speaker recognition system.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2011年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络