节点文献
融合全局和局部信息的度量学习方法研究
Globality and Locality Incorporation in Distance Metric Learning
【作者】 王微;
【作者基本信息】 中国科学技术大学 , 模式识别与智能系统, 2014, 博士
【摘要】 度量学习(Metric Learning)在机器学习中是一个非常重要的基础性命题。距离函数度量了不同样本点之间的相似性,因此,距离函数显著地影响着大部分机器学习算法的性能,如k-近邻分类、径向基函数网络分类、支持向量机分类以及κ-means聚类等方法。由于线性度量学习的高效性和可扩展性(通过核方法可扩展为非线性度量方法),现今的研究重点放在了线性度量(马氏距离)学习问题上。为了提升分类性能并且适应多峰分布的数据集,将全局信息和局部信息融合在马氏距离学习中是一个非常有价值而且具有挑战性的课题。同时随着互联网和信息行业的快速发展,人们面临对海量数据的挖掘和应用,高效性也是度量学习亟待解决的问题。本篇论文针对度量学习中的两个问题:1)通过不引入平衡权重的方式实现全局和局部信息融合;2)降低运算复杂度,进行了系统性的研究,取得了下面三个阶段性研究成果。第一阶段:基于识别坍塌的全局和局部保持映射最大坍塌的度量学习(Maximally Collapsing Metric Learning,MCML)[5]是一种广泛应用的马氏度量学习算法,旨在将所有相同标签的数据点通过学习到的度量矩阵坍塌在一起。针对MCML中数据局部信息的丢失,本部分提出一个度量学习算法将最大坍塌的思想、局部保持的思想和分类识别能力统一在一起,从而有效地将全局信息和局部信息融合在学习到的马氏距离中而不需要引入平衡权重。更重要的是,该提出的度量学习算法是一个凸问题,可以通过一个一阶梯度下降法求解而避免陷入局部极值。为了进一步的降低运算时间,本部分将算法中一些计算密集的步骤映射到了并行平台图像处理器(graphics processor units. GPUs)上。基准数据集上的分类和可视化结果验证了提出算法的可靠性和有效性。第二阶段:基于相关性最大化的度量学习第一阶段提出的度量学习算法虽然能够有效地融合全局信息和局部信息,但是它的目标函数比较复杂,求导的运算复杂度比较高。因此,在第二个阶段我们提出了一个基于统计的马氏学习框架,称为“基于相关性最大化的度量学习”。本部分的贡献包括:·有效地将全局信息和局部信息融合在马氏距离中而不需要引入平衡权重。·区别于经典的相关性衡量标准,例如互信息(Mutual Information)和皮尔森卡方检定(Pearson’s X2test),本部分采用了在再生核希尔伯特空间(reproducing kernel Hilbert spaces, RKHSs)计算的衡量标准,从而不需要对数据的分布进行估计或者假设。·在这个度量学习框架下,通过采用不同的基于核的准则,提出了两种具体的学习算法。这两种算法都属于凸优化问题,而且目标函数的求导运算复杂度很低,可以通过一个一阶梯度下降法有效求解。在基准数据集下的分类、可视化和检索实验结果证明了两种算法的有效性和不同的适用范围。第三阶段:基于信息几何的度量学习方法前两个阶段提出的度量学习算法虽然都是凸的优化问题,但是都需要通过一个梯度下降法迭代求解。不同于现今存在的大部分度量学习算法,信息几何度量算法(Information Geometry Metric Learning, IGML)[24]可以找到一个解析解而不需要求解一个半正定规划问题。在第三个阶段,我们根据信息几何理论,提出了两种算法来分别解决IGML的局限性。(1) IGML的时间复杂度是O(d3+nd2),其中n是训练样本个数,d是数据的维度。基于低秩的假设,本部分提出一个度量学习算法EIGML将IGML的运算复杂度降到了O(nd),极大地提升了算法在高维数据集上的性能。(2) IGML不适用于奇异核矩阵,而且丢失了数据的局部信息。本部分提出一个度量学习算法SIGML将IGML扩展到了非奇异核矩阵的情况而且同时融合了数据的局部和全局信息。我们强调提出的两种算法都能找到解析解,可以被高效优化。实验结果验证了这两种算法的有效性。小结:通过以上三个阶段的研究,论文最后提出的基于信息几何的算法SIGML在全局信息和局部信息融合的思想上涵盖了前两个阶段的研究,而且SIGML能够找到解析解从而避免了迭代求解中参数和步长的调整。对于全局信息保持的算法,我们提出的EIGML极大地降低了运算复杂度,使得度量学习算法能够应用于大规模高维数据。
【Abstract】 Metric Learning is important and fundamental in machine Learning. The metric distances provide a measurement of dissimilarity between different points and significantly influence the performance of many algorithms in machine learn-ing, such as κ-nearest neighbor classification, support vector machines, radial basis function networks and κ-means clustering. Due to the efficiency and scalability of linear metric learning, most effort has been spent on learning a Mahalanobis distance from labeled training data. To improve the classification performance and adapt to the multimodal data distributions, incorporating the geometric in-formation (i.e., locality) with the label information (i.e., globality) is of particular valuable and challenging. Therefore, in this thesis, our specific concern is:1) incorporating globality and locality in Mahalanobis distance without optimizing balancing weight (s);2) reducing the computational complexity. The following three stage research results were obtained.The First Stage:Discriminating Classes Collapsing for Globality and Lo-cality Preserving Projections. As a widely used metric learning method, Metric Learning by Collapsing Classes (MCML)[5] aims to find a distance metric that collapses all the points in the same class while maintaining the separation between different classes. This part attempts to combine the ideas behind Locality Pre-serving, Discriminating Power and MCML in a unified method. The proposed al-gorithm is convex and incorporates the globality and locality information without balancing weight (s). To further decrease the running time, some computation-ally intensive steps of the proposed method are mapped to a GPU architecture. Experimental results demonstrate the effectiveness of the proposed method.The Second Stage:Dependence Maximization based Metric Learning. The method proposed in the first stage has a complex objective function and the deriva- tion is difficult to be calculated. Therefore, this part proposes a general Maha-lanobis distance learning framework referred to as "Dependence Maximization based Metric Learning"(DMML) in a statistical setting. The main contributions of this part include:· DMML effectively incorporates two sources of information (i.e., globality and locality) in Mahalanobis distance without optimizing balancing weight (s).· Distinguished from classical dependence measuring criteria (e.g., Mutual Information and Pearson’s X2test), DMML focuses on using the criteria computed in RKHSs to avoid estimation or assumption of the data distribu-tions. Many existing kernel-based criteria can be incorporated into DMML to tackle the independence measurement problem.· Under DMML framework, two methods are proposed by employing Hilbert-Schmidt Independence Criterion (HSIC)[8] and generalized Distance Co-variance [28], respectively. They are formulated as convex programs and can be efficiently optimized by the first order gradient procedure.The Third Stage:Efficient and Scalable Information Geometry Metric Learn-ing. Although the methods proposed in the first two stages are convex problems, they are optimized by a gradient descent method. In contrast to most existing metric learning methods, Information Geometry Metric Learning (IGML)[24] can find a closed-form solution. This part propose two novel distance metric learn-ing algorithms to alleviate the limitations respectively.(1) The proposed method EIGML can reduce the computational complexity of IGML from O(d3+nd2) to O(nd).(2) IGML is infinite for singular matrices. Moreover, the geometric in-formation of data is lost in IGML. The proposed method SIGML can preserve both locality and globality. We emphasize that these two methods can find the closed-form solutions, leading to efficient optimization.Summary:The proposed method SIGML in the third stage includes the incorporation of globality and locality in the first two stages. SIGML can find the closed-form solution and avoid parameters tuning in the iteration solution. As a globality metric learning method, EIGML greatly reduces the computation complexity and can be applied to large-scale high-dimensional data.
【Key words】 metric learning; Mahalanobis distance; convex optimization; closed-form solution; classification; dimensionality reduction;