节点文献

代价敏感降维及其人脸识别应用研究

Cost-Sensitive Dimensionality Reduction and Its Application in Face Recognition

【作者】 万建武

【导师】 杨明;

【作者基本信息】 南京师范大学 , 应用数学, 2013, 博士

【摘要】 传统的降维方法,追求最低的识别错误率,假设不同错分的损失相同。在一些实际应用中,这一假设可能是不成立的。例如,在基于人脸识别的门禁系统中,存在入侵者类和合法者类,将入侵者错分成合法者的损失往往要大于将合法者错分成入侵者的损失,而将合法者错分成入侵者的损失又要大于将合法者错分成其他合法者的损失。基于此,本文研究代价敏感的降维算法,主要工作如下:1.提出了一种代价敏感的加权局部保持投影(Weighted Cost-Sensitive Local Preserving Projection, WCSLPP)。传统的局部保持投影算法(Local Preserving Projection, LPP)追求最小的识别错误率,其投影方向受类别不平衡影响。为此,本文在LPP模型中嵌入错分代价,定义了一种满足最小错分损失准则的WCSLPP模型。另外,为了解决类别不平衡问题,WCSLPP采用加权策略,平衡了各类样本对投影方向的贡献。在人脸数据集上的实验结果表明了WCSLPP算法的有效性。2.提出了一种嵌入成对代价的线性判别分析(Pairwise Costs in Linear Discriminant Analysis, PCLDA)。PCLDA通过在线性判别分析(Linear Discriminant Analysis, LDA)中引入加权函数,其模型不仅近似于成对贝叶斯风险准则,而且有效抑制了离群类对投影方向的影响。此外,考虑到数据集中类分布密度的差异性,PCLDA定义了一种重要性函数,平衡了各类样本对投影方向的贡献。在人脸数据集上的实验结果表明了PCLDA算法的有效性。3.提出了一种嵌入成对代价的子类判别分析(Pairwise Costs in SubClass Discriminant Analysis, PCSCDA)。本文通过分析基于人脸识别的门禁系统,将其归为一个代价敏感的子类学习问题,然后将错分代价和聚类信息同时注入判别分析框架,提出了一种近似于成对贝叶斯风险准则的PCSCDA算法。在人脸数据集上的实验结果表明了PCSCDA算法的有效性。4.提出了一种嵌入成对代价的半监督判别分析(Pairwise Costs in Semi-Supervised Discriminant Analysis, PCSDA)。在实际的人脸识别应用中,存在大量无标记数据,要获取有标记数据难。为了有效利用无标签人脸图像的信息,PCSDA采用1:方法预测无标签人脸图像的标签信息,与现有标签扩展策略相比,不仅具有较高的预测精度,而且时间复杂度低;以此,再通过引入加权函数,给出了满足成对贝叶斯风险准则的目标函数,提高了投影方向的判别能力。在人脸数据集上的实验结果验证了PCSDA算法的有效性。5.提出了一种代价敏感的半监督Laplacian支持向量机(Sample-Dependent Cost-Sensitive Semi-Supervised Support Vector Machine, SCS-LapSVM)。实际应用问题可能是代价敏感的,而且数据集中可能存在类别不平衡、大量无标签样本以及噪声样本。针对该情况,SCS-LapSVM在采用无标签扩展策略的基础上,将考虑了数据不平衡的错分代价嵌入Laplacian支持向量机的经验损失和Laplacian正则化项中。进一步,考虑到噪声样本对决策平面的影响,SCS-LapSVM定义了一种样本依赖的代价,对噪声样本赋予较低的权重。在UCI数据集和NASA软件数据集上的实验结果表明了SCS-LapSVM算法的有效性。

【Abstract】 Conventional dimensionality reduction algorithms aim to attain low recognition errors, assuming that same misclassification loss of different misclassifications. In some real-world applications, this assumption may not hold. For example, in the door-locker based on face recognition, there has impostor and gallery person. The loss of misclassification impostor as gallery person are larger than misclassification gallery person as impostor, while the loss of misclassification gallery person as impostor will be larger than misclassification as other gallery person. So, this thesis proposes cost-sensitive dimensionality reduction. The main contributions of this thesis are as follows:1. A method called Weighted Cost-Sensitive Local Perserving Projection (WCSLPP) is introduced. Traditional Local Perserving Projection (LPP) aims to attain minimal misclassification error rate, and its projection direction will be influenced by imbalanced data. So this thesis embeds misclassification costs in LPP model, and defines the WCSLPP model which satisfies the minimal misclassification loss criterion. Besides, to deal with class imbalance problem, WCSLPP defines a weighted function to balance the contribution of different classes to the projection direction. The experimental results on face datasets show the superiority of WCSLPP.2. An algorithm named Pairwise Costs in Linear Discriminant Analysis (PCLDA) is proposed. By embeding a weighted function in the Linear Discriminant Analysis (LDA), PCLDA approximates the pairwise Bayesian risk criterion and effectively restrain the influence of outliers to the projection direction. Besides, considering the different class distribution density problem in data sets, PCLDA defines an important function to balance the contribution of different classes to the projection direction. The experimental results on face datasets demonstrate the effectiveness of PCLDA.3. An approach called Pairwise Costs in SubClass Discriminant Analysis (PCSCDA) is suggested. By analyzing the door-clocker based on face recognition, this thesis recognizes the door-clocker as a cost-sensitive subclass learning problem, then embeds the subclass information and misclassification costs in the framework of discriminant analysis at the same time, and proposes the PCSCDA algorithm approximates the pairwise Bayesian risk criterion.The experimental results on face datasets show the validity of PCSCDA.4. We propose a method named Pairwise Costs in Semi-supervised Discriminant Analysis (PCSDA). In real-world applications, there have a large number of unlabeled data and it is difficult to attain labeled data. To effectively utilize the information of unlabeled data, PCSDA uses l2approach to predict the label of unlabeled data. Compared with other label propagation strategies,l2approach has higher prediction accuracy and lower time complexity. Then by embeding a weighted function in LDA model, PCSDA approximates the pairwise accuracy criterion, and improves the discriminative ability of the projection direction. The experimental results on face datasets demonstrate the superiority of PCSDA.5. We develop an algorithm called Sample-Dependent Cost-Sensitive Semi-Supervised Support Vector Machine (SCS-LapSVM). In real-world applications, there may have cost-sensitive problem, the data sets of which may have class imbalance problem, a large number of unlabeled data and noise samples. In view of these situations in the data sets, SCS-LapSVM embeds the misclassification costs considering class imbalance problem in the hinge loss and Laplacian regularization of Laplacian support vector machine, on the basis of label propagation. Considering the effect on decision hypersphere of the noise samples, SCS-LapSVM defines an example-dependent cost which makes the weights of noise samples lower. The experimental results on UCI and NASA data sets show the effectiveness of SCS-LapSVM.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络