节点文献

基于数据的学习:埃尔米特算法与黎曼流形上的法向量估计

Learning from Data: Hermite Learning and Normal Estimation on Riemannian Manifolds

【作者】 石磊

【导师】 周定轩; 邓建松;

【作者基本信息】 中国科学技术大学 , 计算数学, 2010, 博士

【摘要】 在本文中,我们主要研究学习理论中关于回归,流形学习和数据分析的一些算法。我们将详细地讨论这些算法的设计,并从逼近论的观点讨论其渐近性质。论文的第一部分,在再生核Hilbert空间中最小二乘回归正则化算法的框架下,我们研究了基于梯度样本数据的学习问题。在表示定理的帮助下,算法的求解归结为求解一个线性方程组,系数矩阵中涉及核函数值的Gramian矩阵以及核函数偏导数值的Hessian矩阵。额外的关于梯度的样本值可以提高算法的学习性能。通过运用采样算子分析样本误差和,Sobolev空间中的积分算子分析逼近误差,我们给出该算法的误差分析。法向量估计是处理点云数据以及计算机图形学中曲面重构的重要研究课题。在论文的第二部分,我们考虑欧式空间中余维为1的子流形上的法向量估计问题。由于流形是未知的,我们要利用在流形上随机采样得到的样本点来估计法向量。我们提出了一种由核函数构造的学习算法,它实际上是无监督形式的梯度学习。算法的求解归结为求解一个线性代数的特征向量问题。在真实的法向量和采样分布满足一定的条件时,我们得到了关于该算法的误差估计。在论文的最后一部分,我们主要讨论样本依赖假设空间中的正则化回归问题。对于给定的一组样本数据,样本依赖假设空间中的函数定义为由核函数和样本数据产生的一族基函数的线性组合,因此空间中的函数完全取决于其线性组合的系数。这种核函数构造的假设空间其依赖样本的特质给学习算法带来很大的灵活性和自适应性。在这种空间里讨论的正则化算法与传统的再生核Hilbert空间中的算法有本质的不同:我们所考虑的核函数不是对称的,从而不具有半正定性,正则化子作为作用在该空间中函数上的泛函,被取为其相应的组合系数的ep范数的p次幂。这种不同增加了误差分析的困难。具体来说,我们主要在本文中研究了两种情况:p=1和p=2。当p=1时,e1正则化子经常会使解向量具有稀疏性,从而极大提高算法运行的效率。当p=2时,相应的算法是线性的并且可以通过一个线性方程组来求解。这两种算法都已经被一些文献研究过。在本文中,我们利用关于e2经验覆盖数的中心极限定理得到了学习算法目前为止最好的收敛阶。因为我们的目的是给出一种容量相关的分析方法,对于在误差分析中出现的由非对称核函数构造的函数空间,我们给出了其中的单位闭球关于e2经验覆盖数的性质,这在我们的分析中起了十分关键的作用。

【Abstract】 In this thesis, we investigate some algorithms in learning theory for purpose of regression, manifold learning and data analysis. Their design and asymptotic perfor-mance will be discussed in detail from the view point of approximation theory.In the first part, the problem of learning from data involving function values and gradients is studied in a framework of least-square regularized regression in reproduc-ing kernel Hilbert spaces. The algorithm is implemented by a linear system with the coefficient matrix involving both block matrices for generating Graph Laplacians and Hessians. The additional data for function gradients improve learning performance of the algorithm. Error analysis is done by means of sampling operators for the sample error and integral operators in Sobolev spaces for the approximation error.Normal estimation is an important topic for processing point cloud data and sur-face reconstruction in computer graphics. In the second part of the thesis, we consider the problem of estimating normals for a (unknown) submanifold of a Euclidean space of codimension 1 from random points on the manifold. We propose a kernel based learning algorithm in an unsupervised form of gradient learning. The algorithm can be implemented by solving a linear algebra problem. Error analysis is conducted under conditions on the true normals of the manifold and the sampling distribution.In the last part of this thesis, we consider the regression problem by learning with a regularization scheme in a data dependent hypothesis space. For a given set of sam-ples, functions in this hypothesis space are defined to be linear combinations of basis functions generated by a kernel function and sample data, thus are entirely determined by the combination coefficients. The data dependence nature of the kernel-based hy-pothesis space provides flexibility and adaptivity for the learning algorithms. The regu-larization scheme is essentially different from the standard one in a reproducing kernel Hilbert space:the kernel function is not necessarily symmetric or positive semi-definite and the regularizer, as a functional acting on the functions in such kinds of hypothesis spaces, is taken to be the p-th power of the (?)p-norm of the corresponding combination coefficients. The differences lead to additional difficulty in the error analysis.To be more specific, we mainly study two cases in this thesis:p = 1 and p= 2. When p = 1, the(?)1-regularizer often leads to sparsity of the solution vector which can greatly improve the efficiency of computations. When p = 2, the corresponding algorithm is linear and is easy to implement by solving a linear system. Both algorithms have been studied in the literature. In this thesis, we apply concentration techniques with (?)2-empirical covering numbers to get the best learning rate for the the learning algorithms. Since our aim is a capacity dependent analysis, we also show that the function spaces involved in the error analysis induced by the non-symmetric kernel function have nice behaviors in terms of the (?)2-empirical covering numbers of its unit ball.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络