节点文献

流形学习的谱方法相关问题研究

Study on Several Issues of Spectral Method for Manifold Learning

【作者】 曾宪华

【导师】 罗四维;

【作者基本信息】 北京交通大学 , 计算机软件与理论, 2009, 博士

【摘要】 在当今这个信息时代,可以方便地获得大量的数据。许多实际应用中,获得的数据是高维的、庞大的、繁杂的、无序的,并且还在不断的增加,有价值的信息淹没在大规模的海量高维数据集之中,需要发现数据的内在规律以及预测未来发展趋势。流形学习就是假定这些观测数据位于或近似位于一个嵌入在高维欧氏空间中的内在低维流形上,主要目标是发现高维观测数据集的内在低维流形结构和嵌入映射关系。目前,流形学习已经成为机器学习、模式识别、数据挖掘以及其它相关研究领域的研究热点。本文通过分析流形学习的内涵与外延,立足于解决流形学习的谱方法中的重要问题,在算法设计层面和图像流形应用层面上展开了一系列研究。首先对流形学习的典型谱方法做了详细对比分析,然后针对流形的增殖学习、构造近邻关系的合理度量、提高内在低维空间的可分性、基于集成的流形学习、局部保持的算法和全局保持的算法两者优势融合等几方面进行了重点研究,提出了五个以谱方法为基础的流形学习算法,并和相关研究成果做了理论上与实验上的比较,表明了我们提出算法的有效性。本文主要创新成果有以下几方面:(1)定义了增殖流形学习的概念,这有利于指导符合人脑增殖学习机理的流形学习算法的研究。以此为指导原则,提出了一种基于LLE的动态增殖流形学习算法(DKI-LLE)。实验结果表明:DKI-LLE算法比LLE的几个增量式算法在处理新数据集时有更好的效果;DKI-LLE算法发现的整体低维结构更接近批处理的方式获得的低维结构,使得新到来的数据子集所包含的低维结构知识被整合到原有的低维结构中去;而LLE的增量式算法处理新的观测数据时更依赖于原有数据的低维坐标。(2)提出了一种基于测地线距离的广义高斯型拉普拉斯特征映射算法(GGLE)。该算法将测地线距离和广义高斯函数融合到传统的拉普拉斯特征映射算法中,可以调整近邻图结点间的相似度,通过选择超高斯、高斯或者次高斯函数来实现不同程度的近邻局部特性的保持;而且当需要保持更多的近邻关系使得数据点邻域增大时,采用测地线距离可以避免欧氏距离度量不合理的缺陷;实验结果表明在用不同的广义高斯函数度量高维数据点间的相似度时,局部近邻结构保持的程度是不同的,GGLE获得的全局低维坐标也呈现出不同的聚类特性。(3)提出了一种基于GGLE的集成判别算法(EGGLE),该算法的主要优点是:近邻参数k固定,邻接矩阵和测地线距离矩阵都只构造一次,只需要多次选择广义高斯型函数构造多个拉普拉斯矩阵,获取多个独立的低维空间坐标集合,独立学习分类器,集成分类识别。时间复杂度上EGGLE算法与Ensemble-Isomap和En-ULLELDA算法相比较通常更具有优越性。在半监督学习框架下做了LE与EGGLE算法的对比实验,识别结果表明了EGGLE算法的有效性。另外,本文也提出了一种监督的集成流形学习算法(EGGLE-LDA),该算法将线性监督算法LDA和EGGLE相结合,加强集成流形学习在监督学习中的判别能力,使得EGGLE-LDA算法既考虑了数据的类别信息又考虑了几何分布特性。实验结果表明了EGGLE-LDA算法和En-ULLELDA算法的集成识别性能的差异。(4)提出了一种全局拉普拉斯展开算法(GLU),该算法综合了局部保持的拉普拉斯特征映射算法(LE)和全局保持的最大化方差展开算法(MVU)的优点。主要思想是使得局部近邻的点尽可能的接近,同时也要使得相互远离点尽可能远。实现方法是构造局部尽可能近邻和全局展开的双目标函数,引入低维坐标的Gram内积矩阵,通过半定规划(SDP)的方法优化双目标函数,从而学习这样一个内积矩阵,最后对这个内积矩阵进行特征分解求内在低维嵌入。在月亮形人造数据集、真实的USPS手写体数字数据集和雕塑头像数据集上的可视化实验验证了GLU算法的有效性;并且比较了LE、MVU、UDP和GLU等4种流形学习算法的低维可分性和可视化效果,实验结果表明了GLU算法的优越性。

【Abstract】 In the current information age,a large quantity of data can be obtained easily.The obtained data are high-dimensional,enormous,multifarious,disordered,and continuously increasing in many practical applications.The valuable information is submerged into large scale dataset.It is necessary to find the intrinsic laws of the dataset and predict the future development trend.Manifold learning assumes that these observed data lie on or close to intrinsic low-dimensional manifolds embedded in the high-dimensional Euclidean space.The main goal of manifold learning is to find intrinsic low-dimensional manifold structures of high-dimensional observed dataset and the embedding map.At present,manifold learning has become a hot issue in the fields of machine learning,pattern recognition,data mining and other related research.By analyzing the intension and extension of manifold learning,this dissertation is devoted to solving several important problems of spectral methods for manifold learning,and carries out a series of research on algorithm design and image manifold application.Firstly,traditional spectral methods are analyzed and contrasted in detail. Secondly,five problems are mainly investigated,which include knowledge-increasable learning of manifold,constructing a reasonable measure of neighborhood relation, enhancing the separability of the low-dimensional space,manifold learning based on ensemble,combining the advantages of local preserving algorithms and global preserving algorithms in manifold learning.Finally,this dissertation proposes five manifold learning algorithms based on spectral method.Our proposed algorithms are compared with the related research in theories and experiments.And the results show the effectiveness of our proposed algorithms.The main contributions of this dissertation are summarized as follows:(1) The concept of knowledge-increasable manifold learning is defined.It is advantageous to guide the research of manifold learning algorithms which fit to knowledge-increasable learning mechanism in human brain.According to the guiding principle of knowledge-increasable manifold learning,we propose a dynamically knowledge-increasable manifold learning algorithm based on locally linear embedding (DKI-LLE).Experimental results show that DKI-LLE algorithm has better results than several LLE-based incremental algorithms in dealing with the new dataset.DKI-LLE is nearer to the original(batch) LLE for discovering low-dimensional structure,and can integrate the low-dimensional structure knowledge of the new data subset into the previous low-dimensional structure.On the contrary,incremental LLE algorithms depend more on those previous low-dimensional coordinates for dealing with the new observed data.(2) A generalized Gaussian Laplacian eigenmap algorithm based on geodesic distance (GGLE) is proposed,which incorporates geodesic distance and generalized Gaussian function into the original Laplacian eigenmap algorithm.GGLE algorithm can adjust the similarities between nodes of neighborhood graph,and can preserve the different degrees of local properties by using super-Gaussian function,Gaussian function or sub-Gaussian function.Moreover,GGLE can avoid the deficiency of Euclidean distance by using geodesic distance when neighborhoods of data points are enlarged for preserving more neighborhood relations.Experimental results show that the global low-dimensional coordinates obtained by GGLE have different clustering properties and different degrees of preserving local neighborhood structures when different generalized Gaussian functions are used to measuring the similarities between high-dimensional data points.(3) An ensemble-based discriminant algorithm based on GGLE(EGGLE) is proposed. The main advantages of EGGLE include:the fixed neighborhood parameter k,only one time for constructing neighborhood graph and geodesic distance matrix,only needing to choose generalized Gaussian functions many times for constructing many Laplacian matrixes,obtaining multiple independent low-dimensional coordinate sets, independently training multiple classifiers,and combining classification results of these component classifiers to produce the final identification.EGGLE algorithm generally outperforms Ensemble-ISOMAP algorithm and En-ULLELDA algorithm on the time complexity.Under the framework for semi-supervised learning,the comparative experiments of EGGLE and LE are completed.Experimental results show that EGGLE algorithm is effective.In addition,a supervised manifold learning algorithm based on ensemble(EGGLE-LDA) is proposed in this dissertation.For enhancing the classification ability of ensemble-based manifold learning in supervised learning, EGGLE is combined with LDA,so that EGGLE-LDA algorithm takes into account the label information of data as well as the characterization of the geometric distribution. Experimental results show the difference of the ensemble-based recognition performance between EGGLE-LDA algorithm and En-ULLELDA algorithm.(4) A global Laplacian unfolding algorithm(GLU) is proposed so that LE algorithm with preserving local properties is integrated into MVU algorithm with preserving global properties.The main idea of GLU algorithm is that nearby points are pulled as near as possible while distant points are pulled as far apart as possible.Its implementation method is to construct the double object functions for remaining as near as possible in locality and unfolding in globality,to reformulate the double object functions in terms of the Gram inner product matrix of low-dimensional coordinates,to optimize the object function for learning the inner product matrix by semi-definite programming,and to compute a intrinsically low-dimensional embedding via the eigen-decomposition of the inner product matrix.Visualization experiments on synthetic "Two Moons" dataset,USPS handwritten digits dataset and the sculpture head portrait dataset demonstrate the effectiveness of GLU algorithm.In addition,the classification ability and visualization performance of four manifold learning algorithms that include LE,MVU,UDP and GLU are compared.Experimental results show the superiority of GLU algorithm.

  • 【分类号】TP391.41
  • 【被引频次】15
  • 【下载频次】1118
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络