

Person Recognition Based on Audio-Visual Information with Multi-Level Fusion under Smart Room

【作者】 吴迪

【导师】 曹洁;

【作者基本信息】 兰州理工大学 , 控制理论与控制工程, 2014, 博士

【摘要】 近年来,随着人们对安全要求的逐渐提高以及远程视频会议系统的快速发展,智能环境下基于生物特征的身份识别技术成为模式识别领域的研究热点,其在智能视觉物联网、公共安全、金融服务和视频会议系统等众多领域有着广泛的应用。受数据噪音和识别系统本身的限制,基于单一生物特征的身份识别系统所能达到的准确率是有限的,为此,研究人员提出利用视听信息融合身份识别来提高识别的准确率,受到了广泛的关注。但是目前基于视听信息融合的身份识别主要局限于理想环境下的单模态识别以及在现有融合方法上音视频特征的简单融合,对于复杂环境下单模态生物特征的有效提取、高精度高普适性识别算法的构造与音视频特征在不同融合层级最优融合算法的确定少有考虑。从人的视听觉认知机理出发,本文从特征提取、识别算法和融合规则三个方面对视听信息融合身份识别进行了研究,以便为智能环境下的视听信息识别提供可行的解决方案,本文的主要工作和创新点如下:1.实现了复杂背景下人脸特征和语音特征的精确高表征提取针对人脸图像DCT特征系数的最优提取问题,本文提出了一种基于鉴别能力分析的DCT系数提取方法,在分析DCT系数鉴别能力值的基础上提取那些鉴别能力值较大的DCT系数作为特征。在分析头发几何特性和颜色特性的基础上,本文将人体的Hair特征应用于人脸识别,扩展了人脸特征的多样性。针对传统语音参数MFCC受噪声影响较大而且只能反映语音静态特性的缺点,本文基于能有效反映人耳听觉特性的Gammatone滤波器,提取了Gammatone滤波倒谱系数,并基于滑动差分倒谱,提取了能反映语音动态特性的Gammatone滑动差分倒谱系数。2.提出了可有效解决“高维小样本问题”的人脸识别算法目前,基于子空间分析的方法由于描述能力强、可分性好、计算简单等优点,成为人脸识别的主流算法,但常常面临“高维小样本问题”,导致人脸识别系统泛化能力较差。本文结合子空间分析方法和核思想,先后提出核相关权重鉴别分析算法和核鉴别局部保持投影算法,一方面解决了“高维小样本问题”,另一方面解决了传统子空间分析方法由于其线性本质所导致的在处理高度线性不可分对象时能力差的缺点。3.解决了说话人识别GMM模型的建模问题GMM模型是目前说话人识别的主流算法,并且在此基础上衍生了一系列说话人识别算法。针对由于训练语料较短而导致GMM模型参数训练不充分、识别性能下降的问题,本文通过引入因子分析技术,实现了一种自适应均值的GMM模型。i-vector说话人识别系统是在GMM模型和因子分析技术基础上产生的目前国内外说话人识别研究前沿的主流系统,本文通过改进局部保持投影算法,实现了i-vector说话人识别系统中i-vector矢量的有效降维。4.建立了不同层次音视频特征的最优融合规则本文以信息熵理论、概率密度方法和决策科学为指导,建立最优的匹配层融合规则和解决D-S证据理论的证据冲突问题。首先在分析现有证据冲突问题解决方法的基础上,提出基于群体决策和多准则选择融合的证据组合方法,有效解决证据冲突问题;其次为避免对匹配分数密度进行估计,本文将总错误概率TER引入到匹配层融合,通过TER来刻画匹配分数的分布,并将不确定度量融合方法引入到多特征融合识别;然后采用高斯密度求解加性融合中的最优权值,并将其用于逻辑回归排序层融合;最后针对匹配分数密度融合密度函数的求解,引入FAR和FRR以求解信任度函数,并基于三角模算子融合信任度函数,有效规避加性融合中权值的求解。综上所述,本文的研究内容有效提高了计算机对复杂感知信息的理解能力和对异构信息的处理能力,进一步拓展了多生物特征融合身份识别的适用条件和应用范围,有效提高了智能环境下基于音视频多特征融合身份识别的鲁棒性和识别率,对推动我国人机交互技术的发展具有重要的意义。

【Abstract】 Rencent years, with the gradually improvement of the safety requirement and the fastly development of the remote video conference system. The person recognition technology based on biometrics is become the research focus in pattern recognition areas, it is used in smart video Internet of thing, public security, financial services and video conference system and many other fields widely. The accuracy of the single biometric person recognition is limited affected by the data noise and the limitation of the recognition system itself. In order to solve this problem, researchers try to fuse the visual information and audio information using information fusion technique that is visual-audio multi-biomrtric person recognition to improve the recognition accuracy has been received intensively attention. But now the visual-audio multi-biomrtric person recognition research is mainly confined to single biometric recognition in ideal condition and fusing based on existing fusion methods simply, they are few consideration for the effective extraction of the single biometric feature, the structure of the high precise and universal recognition algorithm and optimal fusion methods. From the apparent auditory cognitive mechanism of the people, the paper studied the visual-audio multi-biometric person recognition problem from three aspects:feature extraction, recognition algorithm and fusion method. In order to provide a workable solution scheme for the visual-audio multi-biometric recognition under smart room, the main work and the innovation points of the paper are standing as follows:1.Achieved the effective extraction of the face feature and voice feature under complex environments.First, extract the most effective DCT coefficients as recognition features is the key step to face feature extraction problem, from the angle of selecting the most effective features, this paper presents the DCT coefficient selection method according to Discriminant Power Analysis, and to extract the DCT coefficient which have the larger discriminant power values. At second, we put the Hair feature to be used in face recognition based on its geometrical features and color features in order to extending the diversity of the face features. At last, by means of emulating human auditory, Gammatone Filter Cepstral Coefficients is given out based on Gammatone Filter banks models, in view of the Gammatone Filter Cepstral Coefficients only reflect the static properties, the Gammatone Filter Shifted Delta Cepstral Coefficients is extracted based on Shifted Delta Cepstral. 2.Two face recognition algorithms which can solve the small sample problem are proposed.In order to solve the problem of the lower recognition accuracy and worse robustness of face recognition under smart environment. Two new recognition algorithms called Kernel Relevance Weighted Discriminant Analysis (KRWDA) based on relevance weighted discriminant analysis and kernel discriminate local preserve projection(KDLPP) based on discriminate local preserve projection algorithmis is proposed which using kernel trick.3. The Gaussian Mixture Model modeling problem of speaker recognition are proposed.The performance of Gaussian Mixture Model(GMM) declines rapidly when the length of the training data is reduced under different unexpected noise environment, a adaptive Gaussian Mixture Model is proposed in this paper.The adaptive process for each GMM model with sufficient training data is transformed to the shift factor based on Factor Analysis, when the training data is insufficient, the coordinate of the shift factor is learned from the GMM mixtures of insensitive to the training data and then it is adapted to compensate other GMM mixtures. At the second,in order to enhance the recognition performance of the i-vector speaker recognition system under unpredicted noise environment, a improved local preserve projection algorithm which used for reduce dimension to i-vector is proposed on this paper.4.Optimal fusion rule is established at different levels of audio and visual featuresEstablished optimal fusion rule is the difficulty of fusion recognition, from now on, there is no omnipotent fusion strategy which can be used all of the actual situation. This thesis sets optimal matching layer fusion rules and solves the conflict between evidences bodies based on information entropy, probability density method and decision sciences. Based on the analysis of existing methods to solve the problem of conflict evidence, this thesis proposes evidences combination rule which based on group decision and multi-criteria choice fusion, which can effectively solve the conflict problem of evidence. Next. in order to avoid estimating the match fraction density, this thesis lends the total error probability into matching layer fusion to estimate match fraction density, at the same time, the uncertainty measurement fusion is introduced into multi-feature fusion recognition, and then the optimal weights of weighted sum rule can obtained based on Gaussian density which is applied to logistic regression sort fusion layer. Last, in order to solve the match fraction density fusion probability, FAR and FRR were lead into to solve confidence function and fuse confidence function based on triangle mold operator which can avoid calculates the weights of sum rule.In summary, the research contents improve the computer’s ability to understand the complex information and the processing capabilities of heterogeneous information, and further expand the applicable conditions and applications of the integration of multi-biological identification, effectively improve the robust identification and recognition rate with the multi-feature fusion based audio and video features under smart the environment, it is important significance to promote the development of human-computer interaction technology
