节点文献

说话人辨认中的特征变换和鲁棒性技术研究

Research on Feature Transformation and Robust Technology with Speaker Identification

【作者】 徐利敏

【导师】 唐振民;

【作者基本信息】 南京理工大学 , 模式识别与智能系统, 2008, 博士

【摘要】 为了提高说话人辨认系统的性能和在实际应用中的鲁棒性,本论文在高斯混合模型特征变换、特征加权补偿变换和自适应直方图均衡化三个方面进行了研究,主要研究成果包括:1.提出了基于嵌入变换的对角方差矩阵高斯混合模型的多步聚类算法。为了简便计算,高斯混合模型中的方差矩阵通常直接用对角方差矩阵代替,因而会对相似度的计算产生损失。为了弥补由于采用对角方差矩阵而引起的相似度损失,提出了基于嵌入变换的对角方差矩阵高斯混合模型的多步聚类算法。该方法采用嵌入变换的对角方差矩阵来建立模型;同时将多步聚类算法融入其中,使高斯混合模型能找到其最适合的模型混合数。与普通聚类期望最大(EM)算法相比,多步聚类算法所需的EM估计次数明显减少;与聚类EM估计的GMM方法相比,在同一语音库下平均计算时间降低了约50%,错误识别率平均减少1.4%;在自制和公开的两个语音库下,与嵌入变换的GMM估计方法相比,新方法都可以直接达到说话人辨认错误识别率的最佳点,达到了识别效果和识别时间的统一。2.提出了基于高斯混合模型的加权特征补偿变换的抗噪声算法。针对特征加权算法的局限性和归一化补偿变换方法的特性,提出了基于高斯混合模型的加权特征补偿变换的抗噪声算法。一方面根据帧信噪比对特征值的贡献大小进行加权;另一方面根据说话人识别的声学特性对模型输出的似然得分进行变换,补偿了加权因子在某些环境下的局限性。对于不同信噪比的平稳和非平稳噪声环境,在自制语音库下,与特征加权算法相比,该算法平均识别率提高了2.74%和2.82%;与归一化补偿变换方法相比,平均识别率提高了3.56%和1.34%。在另一公开语音数据集下,与特征加权算法相比,该算法平均识别率提高了3.02%和2.56%;与归一化补偿变换方法相比,平均识别率提高了3.9%和1.14%。3.提出了基于统计模型的自适应直方图均衡化方法。针对说话人特征的统计特性和直方图均衡化在说话人识别中应用的不足之处,提出了应用于说话人辨认中的自适应直方图均衡化方法。该方法首先用较大的区间长度来构造直方图的累积函数,然后根据各区间内特征值频率增量的大小来自适应确定该区间是否需要再划分以及划分的程度。采用这种方法不仅使计算量降低,而且得到的变换特征值的分布更符合实际特征空间,从而进一步提高了噪声环境下说话人辨认系统的识别率和鲁棒性。在同一测试集下,研究两种常用经典噪声(即White和Babble),与普通直方图均衡化方法相比,自适应直方图均衡化方法的平均识别率分别提高了3%和2.9%。在另一公开对比测试集中,该方法的性能同样有相似的提高。

【Abstract】 This dissertation focuses on the research on Transformation-based Gaussian mixture model, weighted features compensation transformation and adaptive histogram equalization to improve the performance of speaker identification and the robustness in practical application environment. Including:1. A multi-step clustering algorithm with transformation-based and diagonal-covariance Gaussian mixture model (GMM) is advanced. In order to simplify the computation, Gaussian mixture density functions always use diagonal covariance matrices. However this also reduces the likelihood of the data, which could consequently affect the classification decision. In order to compensate the losing likelihood, the multi-step clustering algorithm is proposed. In this algorithm, the embedded linear transformation is used to integrate both transformation and diagonal-covariance Gaussian mixture into a unified framework. Also a multi-step cluster algorithm is integrated into the estimating process of GMM to search the appropriate mixture number. Compared with, the estimation frequency is obviously reduced. Compared with the traditional cluster expectation-maximization (EM) algorithm, the newly proposed method can save 50% of time and the error rates decrease by 1.4% on average on the same database. Compared with the transformation embedded GMM, the experiment with two databases indicate that the method reformed in the paper can directly reach the best point of saturation with the right mixture number.2. A weighted features compensation transformation method based on GMM for robust speaker verification is presented. In the method, the scores of features are weighted through frame SNR, while the frame likelihood probabilities are transformed based on the acoustic characteristic of speaker recognition system. In stationary and non-stationary noise environment with different SNR, compared with the features weighted algorithm, this proposed method can achieve the average recognition rate increase by 2.74% and 2.82%, while the method have the average recognition rate increase of 3.56% and 1.34% compared with the normalization of compensation transform method on the same database. On the another open database, the increments are 3.02% and 2.56% compared with the features weighted algorithm, while compared with the normalization of compensation transform method, the increments are 3.9% and 1.14%.3. Based on the statistical characteristics of speaker feature and the particularity of histogram equalization applied to speaker recognition, the adaptive histogram equalization (AHEQ) method for speaker recognition is presented. In this method, the cumulative histogram function is first created with the wide range and then According to the frequency range eigenvalue increment from the size of the interval to determine the need for further delineation and demarcation level. This approach not only reduce the amount of computation, but also the transformation of the eigenvalues more in line with the actual distribution of feature space, making it possible to further improve the recognition rate and robust of Speaker Identification System in noise environment. In the same database, the study used two classic noise (that is, White and Babble), compared with ordinary histogram equalization method, the average recognition rate of AHEQ is increased by 3% and 2.9%. In another comparison testing focused, the performance of the adaptive histogram equalization method is similar improvement.

  • 【分类号】TN912.34
  • 【下载频次】243
节点文献中: 

本文链接的文献网络图示:

本文的引文网络