节点文献

多视度量和回归学习方法及应用研究

Research on the Methods and Applications of Multiview Metric and Regression Learning

【作者】 翟德明

【导师】 高文; 常虹;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2014, 博士

【摘要】 距离度量和回归学习在机器学习、模式识别和计算机视觉等领域起着至关重要的作用。许多实际任务,如图像的聚类、分类、基于内容的图像标注和检索,性能的关键取决于适合的距离度量函数的选择。而回归学习对于解决度量学习、以及图像处理等问题提供了最有效的工具和手段。因此,有关度量和回归学习的研究具有重要意义和广泛价值。然而,绝大多数度量和回归学习算法都是针对单一数据集合。伴随着因特网的飞速发展和数码摄像设备的日益普及,数据通常由多个不同的信息源或不同的特征表示构成,呈现出多模态的特性。为了有效的分析和处理多模态数据,本文主要探讨了多视度量和回归学习问题。目前多视度量和回归学习工作刚刚起步,已有工作全部基于对数据的全局建模。而近年来研究者们发现,与全局方法相比,局部化分析和构建预测函数通常能够取得更低的误差,从而具有更好的鲁棒性和灵活性。此外,局部学习能够充分提升算法处理复杂问题的能力。基于此,本文研究了局部和全局相结合的多视度量和回归学习方法,并将其应用在实际的应用问题中。具体的,本文研究内容分为四个部分:1.提出了一种全局一致局部平滑的多视度量学习算法。所提出的算法通过学习特定的共享隐特征空间,间接的建立起多视观测数据之间的联系。整个学习分解为两个基本阶段:全局一致性共享隐特征空间学习和局部平滑多视度量学习。阶段一,基于谱图理论,对于全部有标记样本对,得到其在低维空间的表示,且该低维空间被视为共享隐特征空间;阶段二,利用正则化的局部线性回归,对于未标记样本和测试样本,学习从输入空间到共享隐特征空间的局部映射函数。其中,图拉普拉斯正则化被引入使得学习到的局部度量函数在整个数据空间保持平滑变化。最终,上述两个阶段都形式化为凸最优化问题,存在闭合解,且求解方法简单。姿态和表情对齐的实验证明了所提出方法的有效性。2.提出了特定实例典型相关分析法。所提出的方法借助经典的统计学习方法:典型相关分析,并将其进一步发展,提出了基于特定实例的典型相关分析法,使其同时具有局部和非线性两种基本特性。与上一个工作不同,所提出的方法不需要采用两阶段学习,因此建立了局部和全局相结合的多视度量学习统一框架。首先,探究了基于最小平方回归的典型相关分析法求解。然后,借助最小平方回归学习框架,沿着数据流形的平滑曲线计算特定样本的局部映射函数,从而近似拟合整个数据空间的非线性分布。此外,为了更好的挖掘并利用未标记样本的信息,本文还进一步讨论了其在半监督情况的扩展。最终,对所建立的优化目标采用交替最优化求解方法,并在联合凸最优化的理论保证下取得全局最优解。3.为了进一步应对大数据问题,提出了参数化的局部多视海明距离度量学习算法。首先定义了离散化的局部多模态哈希映射函数,将数据从原始输入空间映射到二值离散空间,并利用在离散化空间的海明距离作为最终的距离度量。其次,为了平衡局部性和计算的有效性,本文对局部哈希映射函数做了近似,将其参数化表示为一组锚点所对应的映射函数的线性加权组合。同时从理论上给出了近似局部哈希映射的错误上界。接着,建立了局部和全局相结合的优化目标,并利用共轭梯度法和顺序学习过程进行有效求解。在跨媒体检索问题中的实验结果证明了所提出的方法能够更好的建模大数据的复杂结构,并取得更高的查询精度。4.除了对多数据集学习,本文还进一步探讨了对单数据集建立局部和全局相结合的多视模型。并面向图像去噪问题,提出了基于多视核回归的渐进图像去噪方法。首先对目标图像进行多尺度的表示,然后由粗到细采用渐进的方式对图像进行去噪。一方面,在每一个尺度内,采用了基于隐式核的图拉普拉斯最小平方回归模型,使其同时最小化在可度量样本上的最小平方误差,同时保持整个图像数据空间的流形结构(全局结构)。另一方面,在连续的两个尺度间,采用了基于显式核的图拉普拉斯最小平方回归,使得局部结构规律被学习并且从粗尺度逐渐传播至细尺度图像。其中,本文对尺度内和尺度间的相关性采用了统一的目标函数,但两种不同的优化方式,使得图像的全局结构信息和局部规律特性更好的被挖掘和结合起来。实验结果证明了所提出的方法在图像去噪问题中取得到较之主流算法相当甚至更好的性能。

【Abstract】 Distance metric and regression learning play an important role in machine learning,pattern recognition and computer vision. Many tasks such as image classification, clus-tering, content-based image annotation and retrieval depend critically on the choice of anappropriate distance metric. Regression learning provide an effective tool for distancemetric learning and image processing problems. So, the distance metric and regressionlearning is important in theory and application. However, most of the traditional algo-rithms in metric and regression learning only focus on the problem on the single dataset.With the rapid development of internet and the rising popularity of digital cameras, dataoften have different observations or descriptions, which present the multimodal property.In order to analysis and process multimodal data effectively, this paper will focus on mul-tiview metric and regression learning problems. Nowadays the work on multiview metricand regression learning has just started, all existing work are global methods. Recently,researchers observe that, compared with global methods, the estimation error rate is alsolocalized by localizing the prediction function, and thus it appears to be more robust andflexible. In addition, learning in local manner can sufficiently boost the capacity powers.In this thesis, we study multiview metric and regression learning with local and globalcombination, and explore their applications. The contents of the thesis can be dividedinto four sections that are detailed as follows:1. We propose a two stage multiview metric learning with global consistency andlocal smoothness. To study the shared latent space of the multi-view observations, theconnections between data from different views are implicity established. The learningprocess is decomposed into a two-stage: In the first stage, based on the spectral graphtheory, our method get the common low-dimensional embeddings for all labeled corre-spondence pairs; In the second stage, based on regularized local linear regression, ourmethod learn the relationships between input space of each observation and the shared la-tent space for unlabeled and test data. Furthermore, graph-Laplacian regularization termis incorporated to keep the learned metric vary smoothly. The proposed method formu-lates global and local metric learning as two convex optimization problems, which couldbe efficiently solved with closed-form solutions. Experimental results with application to pose and expression alignment demonstrate the effectiveness of the proposed method.2. We propose a unified framework for multiview metric learning via instance-specific canonical correlation analysis. Based on canonical correlation analysis (CCA),we propose instance-specific canonical correlation analysis, which achieves locality andnonlinearity at the same time. Unlike the work above, the proposed method does not needa two stage learning process, and thus establish a unified framework. First, we propose aleast squares solution for CCA which will set the stage for the proposed method. Second,based on the framework of least squares regression, CCA is extended to approximates thenonlinear data by computing the instance specific projections along the smooth curve ofthe manifold. Furthermore, the proposed method can be extended to semi-supervised set-ting by exploiting the unlabeled data to further improve the performance. The optimiza-tion problem is proved to be jointly convex and could be solved efficiently by alternatingoptimization. And the globally optimal solutions could be achieved with theoretical guar-antee.3. To confront with the big data problem, we propose parametric local multiviewhamming distance metric learning First, discrete local multimodal hashing functions aredefined to project data from input features to binary codes. And the hashing distancein the discrete space is computed. To balance locality and computational efficiency, wepropose to approximate the local hashing function for each point as a linear weightedcombination of a small set of projection basis associated with a set of anchor points. Andthe error bound for approximated local hashing projection is verified. Then the objectivefunction with local and global combination is established, and conjugate gradient methodand sequential learning process are exploited for efficient optimization. Experiments re-sults on cross-media retrieval task demonstrate local hash functions can better model thecomplex structure of large-scale datasets, and achieve higher empirical query accuracythan global-based ones.4.Besides the study on multiple datasets, we further discuss multiview models onsingle dataset with local and global combination. We propose a unified framework forprogressive image denoising via multiview kernel regression. We first construct a multi-scale representation of the target image, then progressively recover the degraded imagein the scale space from coarse to fine. On one hand, within each scale, a graph Laplacianregularization model represented by the implicit kernel is learned which simultaneously minimizes the least square error on the measured samples and preserves the global mani-fold structure of the image data space. On the other hand, between two successive scales,the proposed model is learned in a projected high dimensional feature space through theexplicit kernel mapping to describe the inter-scale correlation, in which the local structureregularity is learned and propagated from coarser to finer scales. Moreover, in our methodthe objective functions are formulated in the same form for intra-scale and inter-scale pro-cessing, but with different solutions obtained in different feature spaces. Therefore, theconsistency of local and global correlation in image can be better exploited and com-bined. Experiment results demonstrate the proposed method achieves comparable andeven better results for image denoising problems.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络