节点文献

基于HHT的语音情感识别研究

Speech Emotion Recognition Research Based on Hilbert-Huang Transformation

【作者】 谢珊

【导师】 曾以成;

【作者基本信息】 湘潭大学 , 物理电子学, 2008, 硕士

【摘要】 语音情感识别要求从语音样本中提取情感特征参数,并采用一定的模式识别方法,识别语音中包含的情感类型。这是语音信号处理一个新兴的研究方向,具有广阔的应用前景。语音情感识别中,如何提取能有效反映情感信息的特征是最关键的问题,它直接决定识别的结果。本文用希尔伯特-黄变换(HHT)对情感语音进行处理,从整体上分析其特征,并在此基础上提取特征参数,进行文本无关和说话人无关的语音情感识别,取得满意的效果。具体内容如下:详细论述HHT的原理,揭示其本质特征和用于信号处理的优点。在此基础上提出边际能量的概念,并将其与边际谱一起用于分析情感语音。对高兴、生气、厌烦和平静四种情感语音进行统计分析,发现边际能量和边际谱分别反映情感语音在时域和频域的能量分布特征,能体现不同情感的内在规律。因此,将其作为情感识别的依据,在边际能量的基础上提取时域特征希尔伯特能量统计值(EHHT),在边际谱的基础上提取频域特征:子带能量(SE)、子带能量一阶差分(DSE)、子带能量倒谱系数(SECC)和子带能量倒谱系数的一阶差分(DSECC)。最后采用矢量量化(VQ)的方法,分别用上述特征做说话人无关、文本无关的语音情感识别。结果表明,单独使用时域特征或频域特征不能有效识别语音情感,而将此两种特征结合用于识别,能使识别率最高达到98.53%,且随码本尺寸的变化波动很小,效果相对稳定。本文将HHT用于情感语音处理,将时频特征结合用于语音情感识别,不仅提高了识别率,而且大大缩小了码本尺寸,具有一定的实际意义。

【Abstract】 Speech emotion recognition demands distilling emotional features from speech signals and adopting certain pattern recognition method to determine which emotion the speech contains. It is a new area in speech processing and has wide applications. Feature extraction, which reflects the results directly , is the most important factor.In this thesis, Hilbert-Huang Transformation is applied to emotional speech processing and analyzing. HHT features are distilled and text-independent, speaker-independent emotion recognition is simulated. The details are as follows:Firstly, theory of HHT is discussed, its essence and merit in signal processing is shown, based on which, marginal energy is proposed, and used in emotional speech analysis together with marginal spectrum. Statistical analysis of four emotions: happy, angry, boring and netrual demonstrates that marginal energy and marginal spectrum well reflect the energy distribution characteristics in time and frequency domain respectively. Thus, they can be a basis of emotion recognition. Then statistical Hilbert energy (EHHT) is distilled from marginal energy, sub-band energy(SE) and its derivation(DSE), sub-band energy cepstrum coefficients(SECC) and its derivation (DSECC) are distilled from marginal spectrum. At last, with pattern recognition theory Vector Quantization(VQ), speaker-independent and text-independent emotion recognition is simulated using the above features respectively. Results demonstrate that, time-domain feature or frequency-domain feature respectively can not recognize speech emotion effectively, but combination of these two features make a good recognition rate of 98.53%.In this thesis, HHT is applied to speech processing and emotion recognition, the use of HHT time-frequency features not only enhance the recognition rate, but also reduce the code size, thus , the research of this thesis is both meaningful and feasible .

【关键词】 边际谱边际能量语音情感识别HHT
【Key words】 marginal spectrummarginal energyemotion recognitionHHT
  • 【网络出版投稿人】 湘潭大学
  • 【网络出版年期】2009年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络