节点文献

基于矩阵分解的图像表示理论及其应用研究

Research on Matrix Decomposition Based Image Representation Theory and Its Application

【作者】 肖延辉

【导师】 赵耀;

【作者基本信息】 北京交通大学 , 信号与信息处理, 2014, 博士

【摘要】 随着图像采集设备尤其是智能手机的广泛普及,以及微博、微信、云计算、社交网站等网络多媒体信息服务的迅猛发展,图像不仅在数量上呈现爆炸式的增长,而且在信息的传播和获取中也发挥着巨大的推动作用。这些浩如烟海的图像虽然具有直观、生动、信息量大等特点,但是给实际的存储、传输以及进一步处理都造成了相当大的困难。因此,如何根据图像的内在结构和人类的视觉特性来探索高效的图像表示方法,是计算机视觉和机器学习领域中核心研究问题之一。一个有效的图像表示方法不仅有助于挖掘图像潜在的数据结构,而且有利于降低数据存储和传输的成本。非负性、不变性、稀疏性、非线性以及判别性是图像表示理论的核心问题,本文从矩阵分解的角度出发,围绕图像表示方法中的矩阵非负分解和稀疏分解两个重要的研究方向,取得的创新性研究成果包括:1.针对非负性约束不足以得到具有不变性的图像表示的问题,本文提出了一种基于地形约束的非负矩阵分解方法,其在矩阵分解过程中通过引入了一个优化编码因子的地形约束,来学习特征的不变性。由于地形约束是一个两层分别包含平方非线性和平方根非线性的自底向上的网络,其可以通过将属于同一主题的结构相关的特征池化在一起,使得相关特征在一个结构相关的低维子空间中聚在一起。从而学到具有特征不变性的图像表示。此外,本文在理论上证明了在该方法中提出的交替迭代更新规则的收敛性。实验结果表明,相比于传统的非负矩阵分解方法,该方法通过学习到具有不变性的图像表示,达到了更高的聚类性能。2.为了有效的利用少量标记数据来学习和挖掘未标记数据的潜在结构,本文提出了一种半监督非负矩阵分解方法。该方法通过以惩罚项的方式引入一个类驱动约束,可以在矩阵分解过程中进一步利用数据的标签信息。由于该约束通过为每一个学到的基赋予特定的类标签,使得每一类的基只能有效的表示同一类的图像,而对于其它类的图像无效。此外,为了更好的度量图像的重建误差,本文分别利用欧氏距离和KL散度两种度量方式来评估重建误差,并且理论分析了基于这两种度量方式的算法的计算复杂度。在AT&T ORL、 Yale、Caltech101等真实数据集上进行聚类的结果显示,由于基于类驱动的非负矩阵分解理论很好地结合了图像的标注信息,使得所得到的低维表示有更强的判别力。3.针对无监督的重建独立成分分析方法不能有效的利用训练数据标签信息的问题,本文提出了一种监督的重建独立成分分析方法。该方法通过引入一个判别约束,来实现同时最大化稀疏表示本类能量和最小化其他类能量,使得稀疏表示项按类划分,并且每类图像只能由同一类的基稀疏重构。本质上,这种优化方式等价于最大化稀疏表示的类间散度和最小化相应的类内散度。本文在理论上证明了这个判别约束是一个凸函数,因此把它引入到独立成分分析框架中后,得到优化问题仍然是一个无约束的凸优化问题,其有全局最优解。4.由于线性的重建独立成分分析方法不能有效的表示在原始数据空间中普遍存在的非线性可分数据结构,本文进一步提出了一种监督的核重建独立成分分析方法。该方法通过利用核函数在一个高维的特征空间中学习图像非线性的稀疏表示,使在原始空间中非线性可分的数据结构在这个空间中变得线性可分。实验结果表明,相比于传统的重建独立成分分析算法,本文提出的方法通过学习监督且非线性的稀疏表示,进一步的提高了其在实际分类任务中的性能。

【Abstract】 Data representation is a fundamental problem in image processing and pattern recog-nition tasks. A good representation can typically reveal the latent structure of data, and further facilitate these tasks in terms of learnability and computational complexity. How-ever, in many real applications, the input data matrix is generally of very high dimension, which brings the curse of dimensionality for further data processing. To solve this prob-lem, matrix factorization approaches are used to explore two or more lower dimensional matrices whose product provides a good approximation for the original data matrix. In addition, sparsity is an attribute characterizing a mass of natural and manmade signals, and has played a vital role in the success of many sparse decomposition based image rep-resentation techniques such as sparse coding, dictionary learning, sparse auto-encoders and independent component analysis. Decomposing data into a sparse and discrimina-tive linear combination of features is an important and well-studied problem. The major contributions of the paper are:1. We propose a topographic non-negative matrix factorization algorithm, called TN-MF. Specifically, TNMF incorporates a topographic constraint to intuitively pro-mote the sparseness of encoding factor. Meanwhile, this constraint forces features to be organized in a topographical map by pooling together structure-correlated features belonging to the same hidden topic, which is beneficial to learn complex invariances (such as scale and rotational invariance). Some experiments carried out on three standard datasets validate the effectiveness of our method in comparison to the state-of-the-art approaches.2. We propose a semi-supervised class-driven non-negative matrix factorization method to associate class label with each basis vector by introducing an inhomo-geneous representation cost constraint. This constraint forces the learned basis vectors to represent better for their own classes but worse for the others. Therefore, data samples in the same class will have similar representations, and thereby the discriminability in new representations could be boosted. Some experiments car-ried out on several standard databases validate the effectiveness of our method in comparison to the state-of-the-art approaches.3. To exploit the class information, we extend the unsupervised reconstruction inde- pendent component analysis method (RICA) to a supervised one, namely d-RICA, by introducing a class-driven discrimination constraint. This constraint minimizes the inhomogeneous representation energy and maximizes the homogeneous rep-resentation energy simultaneously, which will make a data sample uniquely repre-sented by the over-complete basis vectors from the corresponding class. In essence, it’s equivalent to maximizing the between-class scatter and minimizing the within-class scatter in an implicit way.4. Since nonlinearly separable data structure generally exists in the original data s-pace, a linear discriminant might not be complex enough for most real-world data. To enhance the expressiveness of the discriminant, kernel trick can be used to map the nonlinearly separable data structure into a linearly separable case in a high di-mensional feature space. Therefore, we develop the kernel extensions of RICA and d-RICA respectively, i.e., kRICA and d-kRICA, to represent the data in the feature space. Experimental results validate the effectiveness of kRICA and d-kRICA for classification tasks.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络