节点文献

说话人识别系统的研究

Study of Speaker Recognition System

【作者】 刘永红

【导师】 肖建; 贾俊波;

【作者基本信息】 西南交通大学 , 电力系统及其自动化, 2003, 硕士

【摘要】 说话人识别是指通过说话人的语音来自动识别说话人的身份,它在许多领域内有良好的应用前景。本文通过分析语音特征参数的特点和说话人识别的基本方法,提出了以美尔倒谱差分和线性预测差分为特征,通过动态时间归整算法来识别的文本相关说话人辨认系统。 本文从语音信号的预处理开始分析,对语音信号进行了端点检测,滤除了语音信号的无声段,为语音特征参数的提取提供了有用的语音段。文中还比较了双门限语音端点检测方法与能频值端点检测算法的性能,实验证实能频值端点检测算法能很好的区分含噪语音端点。 本文应用全极点模型,提取语音信号的线性预测系数,并推导出其倒谱系数,获得线性预测倒谱差分,用以描述说话人声道的动态变化。利用听觉频率非线性特性的美尔倒谱作为语音识别的特征参数,来辨识说话人提供的输入口令。 本文通过MATLAB语音处理工具箱,提取输入语音的特征参数,采用动态时间归整算法来匹配参考模板和测试模板,获得了很高的识别率。本文考虑到系统的安全性,采用美尔倒谱系数识别密码,线性预测倒谱差分识别说话人声道动态变化的双重判决方法,为系统应用在高度机密场合提供了可能,具有运算速度快,模板更新容易,计算量小,差错率低等优点。 为了比较各种识别算法,本文还开发了文本无关说话人识别系统,以美尔倒谱及其差分为特征,建立高斯混合说话人模型,取得了较高的识别率,可应用在识别率要求不是太高的场合。

【Abstract】 Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information include in speech signals. It has well application prospects in many fields. By analyzing speech characteristic parameters and the basis methods of speaker recognition, we choose MFCC and LPCC’s difference to be the speech characteristic parameters. Using DTW to recognize text-dependent speech, we have developed a speaker identification system in this paper.Before picking up the speech signal characteristic parameters, the voice signal is undergoing pretreatment. In this phase, we should find the signal’s endpoint and filter the speech silence segment in order to provide useful speech segment. We give comparison of the two endpoint examination methods: double-gate thresh-hold method and energy-frequency-value method.Experiments show that latter can partition the endpoint of noise speech better.In this paper, we use full pole model to obtain speech signal LPC, then deduce it’s LPCC, and we use the LPCC difference to describe speaker’s track dynamic movement. Also, since MFCC represent hearing frequency nonlinear characteristic,we utilize MFCC to be another speak recognition characteristic parameter to distinguish the input passwords.In this paper we utilize MATLAB Voice Box to abstract speech’s characteristic parameter, use DTW to matching reference model with test model and obtain very high recognition rate. Considering system security, we adopt MFCC to recognize password and LPCC to represent speaker track dynamic movement. The double decrees enable it applying in high secret situations. The system has many merit such as the quick operation velocity, easy model update, less calculate quantity and low error rate.In order to compare the differece of recognition algorithm,we develop text-independent speaker recognition system.we use MFCC and its difference asthe feature,make Gaussian Mixture Model and acquire higher recognition rate.It can be applied in low recognition rate needed situation.

  • 【分类号】TN912.3
  • 【被引频次】27
  • 【下载频次】833
节点文献中: 

本文链接的文献网络图示:

本文的引文网络