节点文献

数据降维算法研究及其应用

Research on Dimensionality Reduction Algorithms and Its Applications

【作者】 张田昊

【导师】 杨杰;

【作者基本信息】 上海交通大学 , 模式识别与智能系统, 2008, 博士

【摘要】 近年来,高维数据经常出现在科学界和产业界相关的领域,如计算机视觉、模式识别、生物信息以及航空航天等。当我们处理这些数据时,它们的高维属性往往会成为处理和应用这些数据的障碍,这表现在与之相关的计算复杂度较高并且结果并不是最优。降维是将数据由高维约减到低维的过程而用来揭示数据的本质低维结构。它作为克服“维数灾难”的途径在这些相关领域中扮演着重要的角色。在过去的几十年里,有大量的降维方法被不断地提出并被深入研究,其中常用的包括传统的降维算法如PCA和LDA;流形学习算法如LLE、ISOMAP、LE以及LTSA。但是,大多数现存的方法仍然受制于各种各样的问题,比如,小样本问题、out of sample problem问题、样本的非线性分布问题以及分类问题等等。为了解决这些问题,在本文中,作者提出了一系列的新算法、改进算法、新的降维框架以及在该框架的基础上开发的新算法。本文的主要贡献在于:1.本文提出了一个新的线性降维算法,线性局部切空间排列(LLTSA)。该算法运用切信息作为数据的局部表达,然后将这些局部信息在可以用线性映射得到的低维空间中排列。LLTSA可以看作是LTSA的线性逼近。2.受流形学习算法中的局部保存思想的启发,我们提出了一种新的降维算法,最大方差映射(MVP),用于人脸识别。该算法通过对流形上局部几何的捕捉来实现局部信息的表达和保存;除了流形学习的特性以外,基于对类别信息的利用,该算法还具备判别能力。3.本文提出了一种新的多模型生物特征识别系统。在该系统中,我们提出了一种专门应用于多模型问题的降维算法,几何保存映射(GPP)。GPP是一个判别的算法,同时它能够通过捕捉模型内部的几何结构来有效地保存几何信息。4.受LTSA算法的启发,本文提出了一种新的流形学习算法,局部坐标排列(LCA)。LCA通过保存片上的邻域关系从而得到局部坐标作为局部邻域的表达,然后将这些提取的局部坐标运用排列技术在全局中排列从而得到最终的嵌入坐标。另外,为了解决out of sample问题,我们将线性逼近方法应用于LCA,即线性LCA(LLCA)。5.本文提出一个能够统一各种基于谱分析的降维算法的框架,其命名为“片排列”。该框架的内容包括两个阶段:部分优化和整体排列。对于部分优化,每一个算法在构建的片上具有不同的优化目标,所构建的片是由某一个给定的点和一些相联系的点构成。对于整体排列,所有的片上部分优化通过运用排列技术被集成化到一起而构建最终的全局坐标。作为该框架的一个应用,我们通过在部分优化过程中施加判别信息,从而开发了一个新的降维算法,判别局部排列(DLA)。本文对统一框架下各种算法进行了讨论并验证了DLA算法。6.为了提高正交邻域保存映射(ONPP)的分类性能,在基于片排列框架下,本文提出了一种改进的降维算法,命名为判别正交邻域保存映射(DONPP)。另外,本文通过并入额外的无类标标号的样本点,将DONPP延伸到半监督的情况,也就是半监督判别正交邻域保存映射(SDONPP)。图像分类及人脸识别的实验分别验证了DONPP和SDONPP的有效性。

【Abstract】 Recently, datasets of high dimensionality have been emerging in many domains of science and industry, such as computer vision, pattern recognition, bioinformatics, and astronomy. When dealing with these types of datasets, the high dimensionality is often an obstacle for any efficient processing of the data. The operations on them are computationally expensive and the results may be not optimal. Dimensionality reduction is the process of transforming data from a high dimensional space to a low dimensional space to reveal the intrinsic structure of the distribution of data. It plays a crucial role as a way of dealing with the“curse of dimensionality”.In past decades, a large number of dimensionality reduction algorithms have been proposed and studied. These popular algorithms including the conventional algorithms e.g., PCA and LDA; the recent proposed manifold learning algorithms, e.g., LLE, ISOMAP, LE, and LTSA.However, most of existing dimensionality reduction algorithms still suffer from various open problems e.g., small sample size problem, out of sample problem, nonlinearity of the distribution of samples, and classification problem. To overcome these problems, in this paper, we propose a collection of new algorithms, enhanced algorithms, and the unifying framework for existing algorithms along with the new algorithms based on the proposed framework, for dimensionality reduction. The main contributions including: 1. We propose a new algorithm, called linear local tangent space alignment (LLTSA). It uses the tangent space in the neighborhood of a data point to represent the local geometry, and then aligns those local tangent spaces in the low dimensional space which is linearly mapped from the raw high dimensional space. LLTSA can be viewed as a linear approximation of LTSA.2. Inspired by the idea of locality preserving, we propose a novel subspace learning algorithm, termed Maximum Variance Projections (MVP), for face recognition. It is a linear discriminant algorithm which preserves local information by capturing the local geometry of the manifold. Two abilities of manifold learning and classification have been combined into the properties of our algorithm.3. We propose a new system for multimodal biometric recognition. In the developed system, Geometry Preserving Projections (GPP), as a new subspace selection approach (especially for the multimodal problem), is developed. GPP is a linear discriminant algorithm, and it effectively preserves local information by capturing the intra-modal geometry.4. Inspired by LTSA, we propose a new manifold learning algorithm, i.e., Local Coordinates Alignment (LCA). LCA obtains the local coordinates as representations of a local neighborhood by preserving the proximity relations on the patch. The extracted local coordinates are then aligned by the alignment trick to yield the global embeddings. In LCA, the local representation and the global alignment are explicitly implemented as two steps for intrinsic structure discovery. In addition, to solve the out of sample problem, the linearization approximation is applied to LCA, called Linear LCA or LLCA.5. We propose such a framework termed“patch alignment”to unify spectral analysis based dimensionality reduction algorithms. It consists of two stages: part optimization and whole alignment. For part optimization, different algorithms have different optimization criteria over patches, each of which is built by one measurement associated with its related ones. For whole alignment, all part optimizations are integrated into together to form the final global coordinate for all independent patches based on the alignment trick. As an application of this framework, we develop a new dimensionality reduction algorithm, Discriminative Locality Alignment (DLA), by imposing discriminative information in the part optimization stage.6. To improve the classification performance of ONPP, we propose a new algorithm termed Discriminative Orthogonal Neighborhood Preserving Projections (DONPP) is based on the framework of patch alignment. Moreover, this work extends DONPP as SDONPP, (semi-supervised DONPP), which is more powerful since it can make use of unlabeled as well as labeled samples.

  • 【分类号】TP391.41
  • 【被引频次】20
  • 【下载频次】1793
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络