节点文献

基于语音信号时变特性的说话人识别

Speaker Recognition on the Base of Time-Varying Characteristics of Speech Signal

【作者】 徐良军

【导师】 费万春;

【作者基本信息】 苏州大学 , 纺织工程, 2010, 硕士

【摘要】 说话人识别是一类特殊的语音识别。近年来,这一技术迅速发展,与文本有关的说话人确认系统在一些需要进行身份核查的场所得到了应用。但仍然有一些问题需要解决,其中关键的问题是,究竟用语音信号的哪些特征描写说话人才是有效而可靠的。说话人识别包括说话人确认和说话人辨认,本文主要研究的是与文本有关的说话人辨认问题。基于语音信号的时变特性,在平均MEL倒谱基础上提取随时间变化的特征频率(包括时变的基音频率),由此得到了由各个语音信号特征频率倒谱值序列构成的时间序列。运用时间序列预处理和数理统计的方法,分离时间序列的趋势波动量和随机波动量。随机波动量是零均值自协方差非平稳的时间序列,利用满阶时变参数自回归(Time-Varying Parameter Autoregressive)模型对随机波动量序列进行分析,进一步提取说话人语音信号的特征参数。在随机波动量序列和用满阶TVPAR模型分析的基础上分别进行说话人识别研究。本文选择最小BIC(Bayesian Information Criterion)法则分析确定回归模型阶次,最后采用马氏距离对说话人进行判别。实验表明,用满阶TVPAR模型进行识别,识别率比随机波动量序列上的识别率有较大提高。在满阶TVPAR模型基础上,取一个特征频率时识别率达到97.3%,两个特征频率识别率达到98.6%。

【Abstract】 The speaker recognition is a special kind of speech recognition. In recent years, with the rapid development of technology, the text-dependent speaker verification system has been used in some areas where need identity authentication. But there are still some problems to be solved. One of them is how to reliably describe the speech characteristics for speaker recognition more efficiently.There are speaker verification and speaker identification in speaker recognition. This paper focuses on text-dependent speaker identification. On the base of time-varying characteristics of speech signal, time-varying characteristic frequency (pitch frequency included) is extracted from the average MEL cepstrum, and the cepstrum value series of characteristic frequency are gained on. The deterministic and stochastic fluctuations of the time series are separated by use of time series pretreatment and statistical methods. As zero mean autocovariance nonstationary time series, the stochastic fluctuations are analyzed by the full order TVPAR (Time-Varying Parameter Autoregressive) model, and the characteristic parameters are extracted from speech signals of the speaker. The speech signals are recognized on the stochastic fluctuations of the time series and analysis with the full order TVPAR model.In this paper, the order of regression model is selected by using the minimum BIC (Bayesian Information Criterion) rule, speakers are discriminated by using Mahalanobis distance. The experimental results manifest that the recognition rate obtained by the full order TVPAR model are higher than only on stochastic fluctuations of the time series, with one and two characteristic frequencies, the average recognition rate reaches 98.6% and 100% respectively.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2011年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络