节点文献

基于音频和视频特征融合的身份识别

Personal Identification Based on Video and Audio Feature Fusion

【作者】 吴迪

【导师】 曹洁;

【作者基本信息】 兰州理工大学 , 信号与信息处理, 2010, 硕士

【摘要】 针对单模态的说话人识别和人脸识别在准确率,应用的限制性和局限性等方面的缺点,本文从信息融合的角度出发,在特征层将两种单模态信息进行融合,实现音频信息和视频信息双模态特征融合的身份识别。本文首先就单模态的说话人识别和人脸识别进行了分析。结合VQ和SVM识别模型各自的优点,实现了一种基于VQ和SVM混合说话人识别模型。对于特征脸人脸识别算法,本文用L1-范数,欧氏距离,MIN距离和混合马氏距离四种度量距离对算法进行了比较。然后将脉冲耦合神经网络应用到人脸识别中,并在此基础上建立了人脸识别系统。其次本文重点对双模态的音视频特征融合识别进行了研究,由于特征层融合可用的信息量大,可以用于实时处理,故本文实现了基于归一化和SVM,基于PCNN两种融合识别算法在特征层对音频和视频特征进行融合识别。前者本文是利用特征相连法将语音特征和人脸特征相连在一起,后者是将两种特征的熵序列融合在一起。实验表明,融合系统的识别率都要比单模态的识别率要高,特别是将噪音加入到语音信号后,单个说话人识别系统识别率下降很快,但是融合识别系统的识别率却能保持在一个良好的水平上。

【Abstract】 Due to the application limitation and low accuracy of single mode speech recognition and face recognition, this thesis fuses two kind of information on feature level by information fusion theory, which realizes personal identification by audio and video two-mode features.First, we analyses single mode speaker recognition and face recognition. Combined the advantages of VQ and SVM, we provide a mixed speaker recognition model based on VQ and SVM. For the eigenface recognition algorithm, we use L1-norm, Euclidean distance, MIN distance and mahalanobis distance as distance measurement and compare the performance of the four distances. We then propose the face recognition algorithm based on Pulse Coupled Neural Network (PCNN) and build a face recognition system by this algorithm.Second, this thesis studies the recognition algorithm based on audio-video feature fusion. Information fusion on feature level has large amount of available information and can be used in real-time computing. So in this thesis we present two kind of recognition algorithm. One is based on normalization and SVM. The other is based on PCNN. We fuse the audio and video signals on feature level and recognize the speakers in the experiment. For the former algorithm, we use feature connection to combine audio feature and face feature. For the latter one, we fuse the entropy sequence of the two kind signals. Experiment results show that the recognition accuracy of the fusion system is higher than that of the single mode system. When noise is added to the speech signal, there is a dramatic decline on the recognition accuracy of single mode system. But for the fusion system, its recognition accuracy keeps on a satisfactory level.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络