节点文献

耳语音说话人识别的研究

Research on Whispering Speaker Recognition

【作者】 丁国梁

【导师】 赵鹤鸣;

【作者基本信息】 苏州大学 , 信号与信息处理, 2009, 硕士

【摘要】 耳语音说话人识别是指根据包含在耳语音中的同说话人有关的信息来自动识别说话人,可以应用于电话银行、特殊场合的身份确认、公众场合下的通讯和国家安全的某些特殊需要等方面。它是一个较新的课题,有许多问题有待解决。因为耳语音发音方式与正常音不同,所以两者在说话人识别上有着很大的差异。本文建立了基于GMM模型的说话人识别系统,通过研究文本无关的说话人辨认,比较了耳语音和正常音的区别并通过特征的修正优化了耳语音说话人识别系统。本文的工作主要体现在以下方面:建立了22人的耳语音库和正常语音库,使用Mel倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、差分Mel倒谱系数(ΔMFCC)、差分线性预测系数(ΔLPCC)和组合特征MFCC+LPCC作为特征参数,比较了正常音和耳语音的说话人识别效果。利用耳语音库和正常语音库,本文比较了MFCC维数的变化对正常音和耳语音的说话人识别的影响。实验中正常音的说话人识别率在16维最高,而耳语音的说话人识别率在50维最高。提出了一种MFCC的改进方法,分频段完成滤波器组的设计。将滤波器组的设计任务分配给各频段独立完成,使改进后的MFCC能更好的表现信号的局部频率特性。实验表明,改进后的MFCC可以有效地提升耳语音说话人识别系统的性能。

【Abstract】 Whispering speaker recognition is to recognize the speaker according to the speaker-related information in whispered speech, it can be applied in several fields, such as telephone banking, identification in special condition, the private speech in public, the special need for nation security, etc. It is such a new subject that many problems were left to fix.As the whispered speech is pronounced in different style from the normal speech, its performance in speaker recognition is quite different from normal speech too. This thesis describe the process of building an auto speaker recognition system based on Gaussian Mixture Model, this system was used to do the research on text independent speaker identification, and the difference of performance between whispered speech and normal speech is analyzed , an improvement to the whispering speaker recognition system was also made. The main work of this thesis is shown below:A whispered speech library and a normal speech library were recorded, pronounced by 22 people, the following features were used for speaker recognition of both normal speech and whispered speech: MFCC、LPCC、ΔMFCC、ΔLPCCand MFCC +LPCC.The performance of normal speaker recognition and whispering speaker recognition under different dimensions of MFCC was tested, the normal speech came to a best performance at the dimension 16, while the whispered had to use dimension 50 to reach the best performance.An improvement for MFCC is proposed: divide the whole frequency domain into several parts, and make the design of MFCC filters separately in each part. As such an improvement makes the design of each frequency band into independent process, the improved MFCC is more suitable for performing the small frequency characteristics of signal. The improved MFCC is effective to improve the performance of whispering speaker recognition, and this has been confirmed by experiments.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2009年 09期
节点文献中: