节点文献

语音情感特征提取方法和情感识别研究

Research on Emotion Recognition from Speech-Features and Models

【作者】 郭鹏娟

【导师】 蒋冬梅;

【作者基本信息】 西北工业大学 , 计算机科学与技术, 2007, 硕士

【摘要】 在目前的语音情感识别研究中,情感特征提取和情感识别方法多种多样,而且由于各文献使用的情感语音数据库不同,识别结果不具有可比性,很难客观地判别特征及建模方法,尤其是采用全局特征建立静态模型和采用短时特征建立动态模型的优劣。本文对含有高兴、生气、悲伤和平静4种情感的语音信号,分析和选择了反映情感变化信息的语音特征,并在项目组录制的情感语音数据库上做了情感识别实验。主要研究内容如下: 1.录制了情感语音数据库。录音文本选自标准TIMIT英语语音数据库,每人以高兴、生气、悲伤和平静四种情感重复朗读25句文本,共录制了46个人、四种感情的4600句语音。通过主观情感感知实验,筛选出情感表达最好的8个人的800句语音,用于文本的情感分析和识别实验。 2.基于情感语音数据库,观察并分析了在四种情感状态下,语音信号的基频、谱信息、语速等特征的变化规律,选择和定义了具有情感判别力的基频统计特征、共振峰、语速、平均能量等23维全局特征,其中除了一般的基频全局特征外,还定义了基频曲线起始端上升和下降斜率相关的特征。 3.研究了高斯混合模型(GMM)的参数训练和识别算法,为全局情感特征建立了GMM语音情感识别实验,结果表明:如果只采用基频相关的12维特征,悲伤、平静的正确识别率较高,而高兴和生气容易被相互误识。加入共振峰、语速、平均能量后,各类情感的识别率都有所提高,这是因为语速、平均能量对四种情感具有判别力,而共振峰能够区分高兴和生气。 4.研究了隐马尔科夫模型(HMM)的参数训练和识别算法,针对提取的语音Mel滤波器组倒谱特征(MFCC),以及一组包括短时能量、共振峰、子带能量的短时特征,做了基于HMM的情感识别实验,结果表明,MFCC不适用于语音情感识别,而添加了子带能量、基频等特征后,平均识别率提高了29.55%。 5.对基于GMM和基于HMM的语音情感识别的结果进行了比较,分析表明:对于语音情感识别,采用全局特征建立静态模型,还是采用短时特征并为情感变化的动态过程建模得到的识别率基本相当,重要的是采用具有什么物理意义的特征。

【Abstract】 Emotional recognition from speech becomes a hot topic currently, but because of different emotion features and recognition modals, and the fact that experiments are done on different emotional speech databases, which causes the results not comparable, it is difficult to discriminate the merits of the features and modals, especially the modal with global features and the dynamic modal with short-time features. Here we first analyze and select the emotional speech features which reflect the variation trend of the four emotions (happy, anger, sad, neutral), and compare results on the global modal and dynamic modal based on the same emotional speech database.1. An emotional speech database has been record. Scripts from standard TIMTT Englishspeech database are read by 46 individuals with four emotions (happy, angry, sad and neutral), each person repeats 25 sentences with the four emotions. Through perception subjective perception and evaluation experiment, 8 persons’ 800 sentences are selected for our experiments.2. Through observing and analyzing, the variation trends on each emotion of the followingfeature curves: pitch, spectral information and speed, we elect and define a 23-dimentional global emotion features (pitch, resonance, speed, average energy, etc.) which are discriminative on the four emotions.3. The training and recognition algorithms of GMM is studied, the GMMs with globalemotion features are built for four emotions. Emotion recognition experiments show that, if only the 12-dimentional pitch related features are adopted, sad and can be correctly recognized than the other two emotions. After the resonance, speed, average energy are considered, the correct recognition rates are improved for the four emotions. Results also show that speed and average energy are discriminant for the four emotions, while resonance is useful for the distinguishing happy and angry.4. The training and (?)ecognition algorithm of HMM is studied, emotion HMMs are builtrespectively with MFCC features (feature 1), and with dynamic features including short-time energy, resonance, sub-band energy (feature 2). Emotion recognition experiments results show that, feature 2 gets the improvement of 29.55% on the

  • 【分类号】TP391.4
  • 【被引频次】8
  • 【下载频次】858
节点文献中: 

本文链接的文献网络图示:

本文的引文网络