节点文献

增强型典型相关分析研究与应用

Research on Enhanced Canonical Correlation Analysis with Applications

【作者】 孙廷凯

【导师】 陈松灿;

【作者基本信息】 南京航空航天大学 , 计算机应用技术, 2006, 博士

【摘要】 机器学习从有限的观察样本概括特定问题世界的模型,离不开数据分析工具的支持,以发现观测数据中隐含的各种关系。典型相关分析(CCA)是研究存在于两组变量之间相关关系的有力工具。作为一种多元数据分析方法,CCA自1936年问世以来,在回归建模、图像分析与处理、计算机视觉、模式识别和生物信息学等领域得到了广泛的应用,并日益受到各领域有关研究者的重视,而多模态识别技术的兴起又为基于CCA的模式识别方法的研究提供了新的契机。本文以CCA数学模型为研究对象展开深入的扩展研究,致力于用增强的CCA模型来解决机器学习中两种主要的学习问题:模式识别与回归建模。本文的创新性研究成果总结如下:(1)提出了一个非线性CCA模型,将一个非线性问题划分为一系列线性子问题的组合,用以解决实际中大量存在的非线性相关问题,并通过数据可视化实验和姿态估计实验验证了算法的有效性。(2)建立了一个CCA单模态识别的统一框架,揭示了“样本-类标号”方式的CCA与线性判别分析之间等价性产生的潜在机理;在此基础上,提出一个基于样本分布的软标号CCA,打破了这种等价性限制,提高了算法的识别性能。(3)提出了一种新的有监督学习方法-判别型CCA,该方法引入样本的类信息,并充分考虑了样本之间的相关关系及其对分类的影响。利用核技巧,进一步提出了核化的判别型CCA,用以解决较为复杂的线性不可分问题;实验表明这两种方法具有较高的识别性能。(4)在判别型CCA基础上,提出了一种有样本缺失的判别型CCA,用以克服实际中由于各种原因导致的样本缺失问题,该方法继承了判别型CCA的优点,且具有识别性能较好、节约时间和内存、对缺失样本数目相对不敏感等优点。(5)CCA将相关性作为样本间相似性度量。将这种思想推广到主成分分析(PCA),提出了基于相关性度量的伪主成分分析。在此基础上,将这种思想方法推广到近年来提出的基于二维模式的PCA算法家族中,使之成为有监督学习方式。此外,在不改变PCA原有算法框架的基础上,提出了引入类信息的PCA。实验表明这两种有监督PCA具有较好的分类效果。

【Abstract】 Machine learning models the problem at hand using the finite observational data with the help of the data analysis tools to reveal the underlying relationship among the observations. Canonical correlation analysis (CCA) acts as a powerful tool to analyze the underlying dependency between the observed samples in two sets of data. Initially proposed in 1936 as the multivariate analysis method, CCA has been widely employed in regression and modeling, image analysis and processing, computer vision, pattern recognition, bioinformatics and etc. As a result, CCA has gained more and more attention of the researchers in the related research fields; in addition, the emerging multimodal recognition techniques offer new opportunities for the CCA based recognition algorithms.In this dissertation, we focus on the two principal problems in machine learning, i.e., regression and pattern recognition, using the proposed enhanced CCA models. The main contributions of this dissertation are summarized as follows:(1) Local preserving CCA (LPCCA) is proposed as a nonlinear extension of CCA to adapt to the nonlinear correlation in real applications. The globally nonlinear problem is decomposed into a serial of locally linear sub-problems. The proposed method is validated through the experiments of both the data visualization and the pose estimation.(2) A simple unified framework is constructed for the unimodal recognition using CCA to reveal the underlying mechanism of equivalence between the class-label-based CCA and linear discriminant analysis (LDA). Moreover, we propose CCA based on the independent soft label for each sample rather than a class to break the limitation of recognition performance due to this equivalence.(3) A novel supervised learning method, termed as discriminative CCA (DCCA), is proposed, which embodies the impacts of both within-class correlation and between-class correlation on classification. With the help of kernel trick, the kernelized discriminative CCA (KDCCA) is further proposed to tackle the linearly inseparable cases. The experiments show the superiority of both DCCA and KDCCA to other relatived methods in terms of the recognition performance. (4) Based on DCCA, the discriminative CCA with missing samples (DCCAM) is further proposed to overcome the difficulties due to the loss of samples in real applications. Besides the inherited advantages from DCCA, DCCAM possess the characteristics of better recognition performace,timesaving, space-saving and relatively insensitive w.r.t. the number of the missing samples.(5) The idea of correlation as the similarity metric between the samples in the context of CCA is generalized to principal component analysis (PCA), and the correlation based pseudo-principal component analysis (p-PCA) is proposed, Moreover, this idea is generalize to the recently developed algorithms of matrix-pattern-based PCA family and better recognition performance can be achieved. On the other hand, another supervised learning method, the class-information-incorporated PCA, is proposed without change of the original PCA framework, and the experiments validate the proposed method.

节点文献中: