节点文献

维数约简中的若干问题

【作者】 何力

【导师】 陆汝钤; 张军平;

【作者基本信息】 复旦大学 , 计算机软件与理论, 2010, 博士

【摘要】 维数约简是机器学习中的重要问题,本文着重介绍了该领域中四个问题的研究成果:流形学习作为非监督、非线性降维方法曾一度广为关注,如何对多样的流形学习算法进行合理的分类与评估一直是难以解决的问题。我们提出了一个基于算法设计思想的分类方法将常见的算法分为保距映射、图嵌入与统计方法三类,我们分别讨论了每类方法共同的优点以及不足;之后我们分几个方面对这些算法进行了评估:我们仔细的分析了常见算法的复杂性;讨论了谱与维数的关系;分析了噪声对每类方法产生的影响;解释了参数空间存在空洞时对算法的影响;使用邻域保持率分析了算法能否保持流形拓扑结构;提出了使用放大因子、主延展方向以及一些定量准则用于分析流形学习算法更细致的特性。作为这些分析的一个简单应用,我们针对人脸识别问题,从诸多算法中选择了较合适的流形学习算法进行降维,并获得了较传统线性降维算法更好的识别率。图嵌入算法是流形学习的一个重要的分支(见第2章),它的参数化(包括线性化和核化两个过程)为我们提供了一个完整的降维框架。核化产生了一个计算代价为O(N3)的问题,这阻碍了该方法在大规模数据上的应用。我们提出使用AP初始化k均值获得代表元进行近似的算法,由于我们的方法能够更好的控制量化误差,在相同代表元个数时能得到对Gram矩阵更好的逼近;我们分析了对不同部分谱逼近的程度,并通过实验说明不同应用需要对不同谱进行逼近。我们还给出了对映射逼近的误差界,并证明该误差界一样被量化误差所控制;相对于对Gram矩阵的逼近,这种方式在PKLR与图嵌入算法上有着更直观的解释,我们的实验也表明图嵌入上该方法获得的解更好且参数更少。我们前期的工作比较了一些线性化图嵌入算法的特点;利用近似算法我们在大规模问题上比较了这些核化图嵌入算法,我们得出了一些有意思的结论,如:求最小特征值的图嵌入算法不适合使用谱下降较快的核函数进行核化;局部性的模型可以通过局部性的核函数得到类似的效果。.我们利用基于核方法构造的独立性准则设计了一种监督维数约简算法,分析表明它可以做为充分维数约简算法如KDR的一种近似。但是相对于KDR每次迭代需要O(N3)的时间复杂度,我们的算法仅需要O(N2)与一次V阶矩阵乘法的时间,具有更低的计算代价。我们在一些模拟数据上讨论了我们的方法可能存在的问题,但是使用真实数据的多数实验中,我们的方法可以给出与KDR类似的结果。我们还讨论了使用HSIC统计量确定SDR投影空间维数的上界的方法,这个问题在多数文献中都没有给出较合理的解决方案。我们进一步讨论了这类算法与图嵌入算法之间的联系,发现图嵌入算法可以为其提供较好的初始值,以此减少随机搜索的次数。为了能让这类模型能够处理非监督信息,我们为原模型添加了Laplace光滑子,通过实验发现在较低维投影时能够获得较仅利用监督信息的模型更好的结果。最后我们提出了使用这类算法处理非监督降维与CCA问题的方法作为今后一个潜在的研究方向。在处理一些实际问题的时候,数据中存在的序关系往往十分重要,因为这些关系揭示了数据在潜在的流形上的分布,在我们的实验中也发现保持序关系能够改善分类器的泛化能力。我们第一次将这类问题从传统分类问题中分离出来,称之为趋势学习。我们比较了趋势学习与其他传统学习问题的异同点,如分类是对分界面建模,而趋势学习是对状态之间的迁移过程建模。通过对传统线性模型SVM与PKLR的仔细比较,我们认为后者能更方便地用于对趋势学习建模。这样我们获得了一个DAG正则化的PKLR模型,由于其约束非凸,我们给出了一个使用CCCP求解的算法。为了验证我们想法的合理性,我们在两组模拟数据和两组真实数据上进行了实验,结果说明在标注样本较少的时候,通过DAG正则化生成的趋势学习模型具有较监督学习与半监督学习模型更好的泛化能力。

【Abstract】 Dimensionality reduction (DR) is one of the most important problems in machine learn-ing. There are four main contributions to this field in my thesis:(?) manifold learning is an unsupervised and nonlinear DR method, which has captured many attentions in the past few years. It is difficult to find a proper taxonomy and eval-uation criteria for manifold learning algorithms due to their diversity. We propose a taxonomy based on the design of the algorithms, including distance-preserving map-pings, graph embeddings and statistical methods; afterwards their cons and pros are discussed respectively; moreover, we focus on several other aspects:the complexity of the algorithms; the relationship between spectra and dimensions; the impact of noise to each category; the consequence in existence of "holes" in the parametric space; local-ity preserving rates which can detect whether the topological structure of the manifold is destroyed; magnification factors, principal spreading directions and their quantitive criteria which allows us to explore more detailed properties of the mapping; as an ap-plication in face recognition, we choose a proper algorithm for DR and it yields higher recognition rate than traditional linear DR methods;(?) as an important branch in manifold learning, the graph embedding algorithms (c.f. chapter 2) can be parametrized (including linearization and kernelization) and make a complete DR framework. But the O(N3) cost of optimizing the kernelized mod-els makes it impractical for real applications; we propose an approximation algorithm based on AP-initialized k-means, which has a lower quantization error and therefore is a better approximation to the Gram matrix than other methods with the same amount of exemplars; we analyze the approximation to different spectra and the experiments verify that different applications demand different spectra; we also find an error bound for measuring approximation of the mapping, which is proved to be bounded by the quantization error as well; direct approximation of the mapping function has intuitive interpretations compared to the strategy of approximating the Gram matrix and in our experiments on graph embedding we find the former has better performance with fewer parameters than the latter; in our previous work, we have compared some lin- earized graph embedding algorithms; with the approximation algorithms, we are able to compare them on some large-scale problems, from which we draw some interesting conclusions, e.g. those graph embeddings corresponding to minimal eigenvalues suffer from kernel functions with fast-descreasing spectra; local models might be realized via local kernels instead;(?) we propose a supervised DR algorithm based on recently-developed independence cri-teria using kernel methods; our analysis shows it yields comparable results to SDR algorithms such as KDR but it has lower computational cost of O(N2) and one mul-tiplication of size-N matrices while KDR takes O(N3) time; we discuss the potential problems of our models on some simulated data but in many practical problems our models yields comparable results to KDR; we also develop an algorithm to find an upper bound for SDR, which is not addressed in many literatures; a further analysis discloses the relationship between these algorithms and graph embedding; therefore graph embedding yields a reasonable initial value for these algorithms and cuts down the cost of random searching; to incorporate unsupervised information in these mod-els, Laplacian smoother is employed, which outperforms supervised models when the data is projected into low dimensional space; we point out a potential research direction of solving unsupervised DR and CCA problems with these algorithms;(?) ordinal relationship is very important in many practical problems because it reveals how the data are distributed on a manifold and in our experiments we find it improve the generalization capability of learners; we treat these problems in a different way from traditional discriminative tasks, which we call tendency learning; we show the differ-ences and connections of tendency learning with other well-known learning problems, e.g. classification is modelling the separation surface while tendency learning is mod-elling the transition; we analyze the feasibility of applying two linear models (SVM and PKLR) in tendency learning, which shows the latter is more suitable for tendency learning; the DAG-regularized model we obtain has to be solved with CCCP due to its non-convex constraints; our experiments on two simulated data and two real data show the DAG-regularized model for tendency learning outperforms other supervised and semi-supervised models especially when few data are labeled.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络