节点文献

说话人识别中语音特征参数提取方法的研究

Study on Speech Feature Extraction Algorithm in Speaker Recognition System

【作者】 王玥

【导师】 钱志鸿; 王树勋;

【作者基本信息】 吉林大学 , 通信与信息系统, 2009, 博士

【摘要】 本文主要研究了应用于说话人识别系统的语音特征提取技术。针对加性噪声环境中的语音增强和端点检测、基音特征提取、听觉特征参数提取及降维等方面进行了深入的研究与探讨。论文的主要研究内容如下:1.提出了一种基于扩展谱相减的语音增强算法,使得对背景噪声的估计相对传统方法更加精确。结合语音缺失概率和动态阈值法提出了一种新的端点检测算法。实验证明该算法在低信噪比条件下也能准确检测出语音起始点。2.提出了一种基于CAMDF的倒数加权自相关来进行基音周期估计方法,即RCAF(Reverse CAMDF Autocorrelation Function)算法。仿真实验结果表明,RCAF算法能够减少由共振峰和噪声所引起的异变点对搜索峰值的影响,从而精确地提取基音周期,相对于传统算法具有更强的抗噪声性能。3.对人耳听觉模型进行了深入研究,采用Gammatone和Gammachirp这两种滤波器来建立耳蜗工作模型并设计其数字滤波器的实现方法。该组滤波器与人耳听阈曲线拟合度高,具有良好的模拟人耳听觉的特性。4.提出了两种基于人耳听觉特性的语音特征参数:Gammatone滤波器系数(GTF)与Gammachirp滤波器系数(GCF),在与文本无关的说话人辨认实验中,取得了优于传统特征参数的性能。针对听觉特征维数较高难以应用的问题,探讨了基于主成分分析和离散余弦变换的特征降维方法,给出了基于PCA降维的说话人识别算法,通过离散余弦变换得到了听觉倒谱特征。在纯净语音和带噪语音情况下分别进行仿真实验,结果表明经过降维后的听觉特征仍然具有良好的噪声鲁棒性,在噪声条件下仍然获得了最优的识别率。

【Abstract】 As a kind of biometric identification technology, speaker recognition is to recognize people’s identity from its voice, which contains physiological and behavioral characteristics specific to each individual. One significant use of speaker recognition is to determine whether a speaker has the right to enter security or confidential systems. Using speech password has advantages the traditional way by inputting password on keyboard doesn’t have, for it is unforgettable and cannot be easily taken. Speaker recognition technology is a very promising area of research.Most speaker recognition systems are designed for ideal environment and easily acquired high accuracy in controlled quiet lab situation. However, when a speaker recognition system is used in a real-life situation, there is bound to be a mismatch between training and testing. The background noise to cause the performance of system accuracy decrease sharply. This is the major obstacle to the commercial use of speaker recognition system. So, how to increase the robustness of speaker recognition system is significant and necessary. The thesis focus on how to improve the recognition ratio and robustness of speaker recognition system by several aspects. The main innovation ideas of the dissertation are listed as follows.1. An endpoint detection algorithm that combines expanded spectral subtraction with the SAP (speech absence probability) dynamic threshold is proposed based on traditional methods. The algorithm employs a method of expanded spectral subtraction based on the noise compensation structure, which can estimate the noise during speech presence. A method of endpoint detection based on the SAP soft decision is given, which improves robustness and precision of endpoint detection. The experiments show that better performance can be obtained even if SNR is equal to -10dB whereas such performance cannot be achieved by traditional two-doors methods with the same SNR.2. Pitch detection is one of the most difficult technologies in speech signal processing under noisy conditions. A new pitch detection of noisy speech signal for lower SNR is proposed, which is based on Reverse CAMDF Autocorrelation Function (RCAF) and searching tentative smooth measurement. The algorithm can estimate noise during speech presence, which employs the method of expanded spectral subtraction based on noise compensation structure. RCAF algorithm improves the robustness and precision of pitch detection. A number of experiments show that by RCAF method, higher efficiency and better detection accuracy can be obtained while the SNR is equal to -10dB. However, such performance can not be achieved by traditional methods, AMDF, CAMDF and AWAC under the same SNR.3. Auditory filter plays an important role in understanding the mechanism of hearing, auditory modeling and speaker recognition. Digital implementations of linear gammatone and Gammachirp filters are regularly part of auditory models and can be used in the sound processing in cochlear implants. This paper mainly studied on Gammatone and Gammachirp auditory filter, including their definition, amplitude-frequency response, and performance in simulating the basilar membrane filtering characteristics. Besides, the paper also compared the two auditory filters, explaining their relation and difference. How close digital impulse, magnitude, and phase responses match the corresponding properties of the analog gammatone and Gammachirp filters were evaluated for two infinite-impulse response filter designs. The gammachirp filter was implemented with a small number of filter coefficients using IIR filter. The result shows that the combination of a gammatone filter and an IIR asymmetric compensation filter excellently approximated the gammachirp filter.4. An auditory based feature extraction algorithm was developed to improve the recognition performance of speaker identification algorithms using human auditory characteristics. The sub-band energies of the extracted auditory features were calculated using Gammatone and Gammachirp filter bank instead of the commonly used triangle filter bank. The center frequencies and bandwidths then determined according to the equivalent rectangular bandwidth (ERB). The proposed method was compared with two commonly used techniques; LPCC and MFCC in a text-independent speaker identification system. The simulation results prove that the two proposed features outperform the widely used MFCC and LPCC and perform more robust to noisy environment with low environmental SNR level.5. For the defect of high-dimensional human auditory features, using two methods to extract low-dimensional features of the speaker is in order to reduce the computational complexity. The two methods of Multivariate Statistieal Analysis are: Prineipal Component Analysis (PCA) and Discrete Cosine Transform (DCT). And the first and second order delta cepstrum and the shifted delta cepstrum is derived based on these auditory features. Compared to the standard Mel-frequency cepstral coefficients, the auditory features yielded higher recognition rate in a speaker recognition system. Also the feature set has better classification and robustness characteristics than traditional speech features.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2009年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络