节点文献

基于EMD的说话人识别研究

Speaker Recognition Based on EMD

【作者】 刘亚丽

【导师】 杨鸿武;

【作者基本信息】 西北师范大学 , 电路与系统, 2010, 硕士

【摘要】 在生物认证领域,说话人识别以其独特的优势——方便性、经济性、准确性,逐渐成为人们日常生活工作中至关重要的身份认证方式,并已被广泛地应用于电子商务、司法等安全领域,是当前的一个研究热点。说话人的特征参数是构建说话人识别系统的基础。当前,大多数研究中提取说话人特征参数均是应用短时分析方法(傅里叶变换法),但是说话人的语音信号是典型的非线性信号,采用线性信号分析方法势必会丢失一些重要的信息。针对此种情况,本论文展开了一系列的研究,主要工作与创新如下:第一:论文首先改进了现有的特征参数。采用感知加权技术,选择基于心理声学模型计算得到的信号掩蔽比插值作为权重函数,并将权重函数应用到mel倒谱分析中获得加权mel倒谱系数(WMCEP),实验中将WMCEP结合GMM识别模型进行说话人识别研究。第二:论文引入了非线性信号分析方法--希尔伯特黄变换(Hilbert-Huang Transform, HHT),其组成部分是经验模态分解法(Empirical Mode Decomposition, EMD)和希尔伯特谱分析(Hilbert Spectral Analysis, HSA)。应用EMD分解法并结合短时分析技术,处理语音信号,提出了三种特征提取算法。实验中选用了适用于分类问题的SVM识别模型并结合提出的特征参数应用到说话人识别中;同时为了比较分析SVM的识别性能,将GMM识别模型作为比较模型。第三:论文着重从理论分析的角度研究了基于EMD分解法提取特征参数的可行性和有效性。采用的分析方法是基于HSA谱和边界谱的EMD特征提取以及基于残差相位的EMD特征提取。EMD分解法的引入是一种新的尝试,本论文基于此提出的特征提取方法具有一定的理论依据和较好的实用效果,为今后的语音识别和说话人识别研究提供了一定的研究基础。

【Abstract】 During the biometric systems, speaker recognition has been becoming a prominent recognizing way, based on its convenience, economy and accuracy. Currently, speaker recognition has been widely applied to electronic business, helpdesks, forensics, telephone banking and etc. Speaker features are basic to speaker recognition system. Nowadays, most of studies extract speaker features by short-time analysis method, or Fourier transform. However, speech signal is of typical non-linear signal. In fact, using linear-signal analysis to extract speaker features cannot avoid ignoring some important information. Facing this situation, this paper has done a series of studies, the main work are as follows:First, this paper improves the present features. This paper applies the psychologically weighted technology in mel-cepstrum analysis and adopted the Signal-to-Mask Ratios (SMRS) obtained from psychoacoustic model as weighting function to acquire the weighted mel-cepstrum coefficients (WMCEP).Secondly, this paper introduces the non-linear signal analysis, or Hilbert-Huang Transform (HHT), which is composed of Empirical Mode Decomposition (EMD) and Hilbert Spectral Analysis (HSA). The EMD, together with the short-time analysis, is used to analyze speech signals to extract three kinds of speaker features. In the experiments, SVM model is applied to speaker recognition. On the stage of training, it builds speaker models, while on the stage of predicting, it compares the features with speaker models built during training. In order to express SVM’s classification abilities, this paper also uses GMM as comparison model.Thirdly, from the perspective of theoretical analysis, this paper analyzes the feasibility and effectiveness of the method, which is to extract speaker features by combination of EMD and short-time analysis. The analysis methods are based on two theories; one is the HSA (Hilbert Spectral Analysis) spectrum and marginal spectrum, while the other is residual phase.As a new try, this paper applies the EMD to propose new ways to extract speaker features, which has certain theoretical basis and practical effect. Most importantly, it is good for the future study of auto speech recognition and speaker recognition.

节点文献中: