节点文献

非负块配准模型与优化算法研究

Non-negative Patch Alignment Framework and Its Optimization

【作者】 管乃洋

【导师】 骆志刚; 陶大程;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2011, 博士

【摘要】 非负矩阵分解是二十世纪九十年代末期新兴的数据降维方法,经过十多年的发展,已广泛应用于模式识别、数据挖掘和信息检索等领域。非负矩阵分解不同于传统的SVD分解、QR分解、Cholesky分解、LU分解和特征值分解,它将非负数据矩阵近似成两个非负子矩阵的乘积。由于非负矩阵分解把数据表示成特征的纯加性叠加,这种数据表示方法天然地隐含稀疏特性,与人脑对信号的响应机制相一致,能够有效地抑制信号噪音。同时,由于特征中不含任何负元素,符合真实世界的物理假设。因此,非负矩阵分解已经广泛应用于文本聚类、邮件监控、盲源信号分离、音频信号分析、人脸识别、图像标注、图像分割、光谱图像分析、基因微阵列数据分析等领域。近年来,非负矩阵分解引起越来越多的关注,美国贝尔实验室、田纳西大学、威克森林大学、佐治亚理工学院、芬兰的赫尔辛基大学、日本的RIKEN脑科学研究所、台湾清华大学、浙江大学等研究机构都开展了非负矩阵分解的研究工作。它们的研究成果极大地推动了非负矩阵分解技术的发展,把非负矩阵分解的应用拓展到互联网、信息安全、遥感图像和数学模型求解等领域。因此,开展非负矩阵分解的研究具有重要的现实意义。本文从框架模型入手,在数学上建立非负块配准模型,从统一的角度分析非负数据降维算法,并用以开发新的非负矩阵分解模型。根据非负块配准模型的分析结果,本文提出非负判别局部块配准模型,克服了传统非负矩阵分解模型的缺点,提高了非负矩阵分解的分类效果。为了克服传统非负矩阵分解优化算法收敛速度慢的缺点,本文利用牛顿法快速搜索最优步长,提出非负块配准快速梯度下降算法。为了克服传统非负最小二乘问题优化算法的缺点,本文利用最优梯度法在无需线搜索的情况下以二阶收敛速度求解非负最小二乘问题。在此基础上提出非负矩阵分解的高效求解算法,并开发非负块配准最优梯度法。为了克服传统优化算法应用于流数据处理时计算开销过大的缺点,本文提出非负矩阵分解在线优化算法,利用鲁棒随机近似算法更新基矩阵,提高在线优化算法的鲁棒性。本文的主要创新点包括:1.非负块配准模型提出基于块配准的非负矩阵分解框架模型——非负块配准模型。自非负矩阵分解提出以后,提出多种改进模型。利用局部视觉特征近似正交的特点,提出局部非负矩阵分解模型。利用流形学习技术,提出图罚分非负矩阵分解模型保持数据几何结构。利用Fisher判别技术,提出判别非负矩阵分解模型引入标签信息。然而,这些非负矩阵分解模型是由研究人员根据各自的需求和经验而设计的,内在差异巨大,难以理解其共同特性,实际应用中给工程人员选择模型带来困难。本文基于块配准框架,建立非负块配准模型,从统一的角度理解已有非负矩阵分解模型,揭示其内在差异和共同特性,指导工程人员选择模型,帮助研究人员开发新的非负矩阵分解模型。利用拉格朗日乘子法,提出乘法法则算法优化非负块配准模型,并用辅助函数技术证明算法的收敛性。该算法可用于求解非负块配准模型的大多数派生模型,包括标准的非负矩阵分解模型。2.非负判别局部块配准模型根据非负块配准模型的分析结果,提出非负判别局部块配准模型。从非负块配准模型的角度看,局部非负矩阵分解为样本和基向量分别建立由自身组成的样本块,在局部优化过程保持数据的能量;图罚分非负矩阵分解为样本建立由自身和有限最近邻组成的样本块,在局部优化过程保持数据几何结构,但是忽略样本判别信息;判别非负矩阵分解为样本和各类中心点建立样本块且样本块分别由所有同类样本和中心点组成,在局部优化过程保持数据判别信息,但是由全部样本组成的样本块要求数据服从高斯分布。非负判别局部块配准模型克服了已有非负矩阵分解模型的缺点,为每个样本建立两类样本块:类内块由同类样本中有限最近邻组成,局部优化过程保持数据局部几何结构,放宽数据高斯分布假设;类间块由不同类样本中有限最近邻组成,局部优化过程最大化类间边界,从而保持数据判别信息。因此,非负判别局部块配准模型的分类效果较好、鲁棒性较强。本文利用全局配准技术把两种局部优化过程映射到全局坐标系进行并把二者结合,套用非负块配准的乘法法则算法优化所提非负判别局部块配准模型。3.非负块配准快速梯度下降算法从梯度下降的角度改进非负块配准乘法法则算法,利用牛顿法实现快速线搜索,提出非负块配准快速梯度下降算法。快速线搜索沿着调整负梯度方向搜索最优步长,在不超出第一象限边界的情况下更新矩阵因子,大大提高乘法法则的收敛速度。利用凸函数的Jesen不等式,证明快速梯度法的收敛性。为了克服矩阵因子整体的最优步长可能为1的缺点,即蜕化成乘法法则,本文为矩阵因子的每列(或行)设置步长,用多变量牛顿法搜索步长向量,提出多步长快速线搜索方法。为了降低多变量牛顿法计算复杂度高,本文改进步长设置策略,提出平衡多步长快速线搜索。利用多步长和平衡多步长快速线搜索,本文提出多步长和平衡多步长快速梯度下降算法,并利用凸函数的Jesen不等式证明它们的收敛性。4.非负块配准最优梯度法通过分析非负块配准优化子问题的性质,利用最优梯度法交替更新矩阵因子,提出非负块配准最优梯度法。非负矩阵分解优化算法是目前的热点问题,继乘法法则之后出现了非负最小二乘法、投影梯度法、伪牛顿法和Active Set方法等一系列方法。然而,乘法法则收敛速度慢且存在零元素问题;非负最小二乘法无法从理论上保证收敛性;投影梯度法的线搜索过程计算开销过高;伪牛顿法在求解过程中计算Hessian矩阵的逆,计算开销大且存在数值不稳定问题;Active Set在矩阵不满秩时会出现数值问题,且难以用于优化非负块配准模型。因此,非负矩阵分解的高效优化算法仍然是个开放性问题。本文将非负矩阵分解优化问题看成两个子问题,数学上证明了两个子问题都是凸问题且其梯度是Lipschitz连续的,从而利用最优梯度法以O(1/k2)的收敛速度求解每个子问题,从而提出非负矩阵分解高效优化算法,克服了传统非负矩阵分解优化算法的缺点。通过分析非负块配准问题的子问题的性质,本文提出非负块配准最优梯度法。5.非负矩阵分解在线优化算法提出非负矩阵分解在线优化算法,利用鲁棒随机近似算法以在线的方式更新基矩阵。非负矩阵分解优化算法的空间复杂度与样本维数和样本规模成正比,由于计算机存储器容量的限制,难以满足流数据处理的需求。此外,新样本到达时,传统非负矩阵分解算法需要重新启动以更新分解结果,带来不断增加的巨大时间开销。因此,研究人员提出在线非负矩阵分解算法,利用新到达的样本更新分解结果,克服传统非负矩阵分解算法在时间复杂度和空间复杂度两方面的缺点。然而,已有的在线非负矩阵分解算法的收敛速度受噪音、矩阵不满秩等因素影响,存在数值不稳定问题。本文提出在新样本到达时利用鲁棒随机近似算法以O(1/√k)的收敛速度更新基矩阵,提高在线非负矩阵分解的鲁棒性。利用准鞅理论,本文证明了所提算法的收敛性。为了克服空间复杂度过高的缺点,本文提出用缓冲池技术存储有限量的历史样本,用新样本替换缓冲池中的旧样本以保证基矩阵引入最新的样本统计信息。

【Abstract】 Non-negative matrix factorization (NMF) is a novel dimension reduction methodwhich has been widely used in pattern recognition, data mining and information retrievalsince the end of the twentieth century. Distinguished from other matrix decompositionmethods such as singular value decomposition, QR decomposition, Cholesky decompo-sition, LU decomposition, and Eigen-value decomposition, NMF approximates a non-negative matrix by the product of two low-rank non-negative matrices. Since NMF rep-resents sample by an additive combination of features, it yields natural sparse and parts-based representation. Such data representation is consistent with the physiological andpsychological evidence in human brain, and thus NMF usually suppresses the noises. Inaddition, NMF based data representation is consistent with the physical hypothesis in thereal-world because it allow neither negative entries in the features. Therefore, NMF iswidely used in text mining, email surveillance, blind source separation, video analysis,face recognition, image annotation, image segmentation, spectral analysis, gene micro-array analysis etc. Recently, NMF has received more and more attentions from severalinstitutes such as the Bell Laboratory, University of Tennessee, Wake Forest University,Georgia Institute of Technology, University of Helsinki, RIKEN Brain Science Institute,National Tsing Hua University, and Zhejiang University. Their significant works extendthe applications of NMF to many fields such as Internet, Information Security, RemoteSensing and Mathematical Modeling. Thus studies on NMF are practically important.The thesis first builds up a non-negative patch alignment framework (NPAF) whichunderstands existing non-negative dimension reduction algorithms in a unified viewpointand helps developing new NMF extensions. Then we proposed non-negative discrimi-native locality alignment (NDLA) by using NPAF. NDLA overcomes several drawback-s of NMF and improves its classification performance. Thirdly, to overcome the slowconvergence deficiency of traditional NMF optimization algorithms, we proposed a fastgradient descent (FGD) method which searches the optimal stepsize along the rescalednegative gradient direction by using Newton method. Fourthly, to overcome the defi-ciencies of traditional optimization algorithms for non-negative least squares (NLS), weapplied the optimal gradient method (OGM) to optimizing NLS which converges at therate of O(1/k2) without line search. Furthermore, we proposed an efficient NMF solver as well as an efficient NPAF solver based on OGM. Finally, we proposed an online NM-F optimization algorithm based on robust stochastic approximation (RSA) to reduce thehigh computational overhead of traditional NMF optimization algorithms when runningon streaming data. Benefit from the RSA algorithm, the proposed online NMF algorithmperforms robustly in practice. There are following main contributions:1. Non-negative patch alignment framework Since NMF was proposed, there weremany works on NMF models. In particular, Li et al. proposed local NMF (L-NMF) which yields parts-based features in most case based on the fact that localvisual features are approximately orthogonal. To incorporate the geometry struc-ture embedded in the data, Cai et al. proposed graph regularized NMF (GNMF).To incorporate the label information, Zaferiou et al. proposed discriminative NMF(DNMF) by using Fisher’s discriminant criteria. However, all these methods aredesigned by researcher’s intuition and experience, and thus it is difficult to under-stand their intrinsic difference. The thesis proposed a non-negative patch alignmentframework which interprets NMF and its extensions from a unified viewpoint. N-PAF not only helps engineers to select suitable models in practice, but also helpsresearchers to develop new NMF models under the NPAF. By using the Lagrangemultipliermethod,wedevelopedamultiplicativeupdaterule(MUR)foroptimizingNPAF. We proved the convergence of MUR by using auxiliary function technique.Note that MUR could optimize most NPAF based models including the traditionalNMF model.2. Non-negative discriminative locality alignment From the viewpoint of NPAF, L-NMF builds local patch for each sample and base vector by themselves and retainsas much energy as possible in their part optimizations. GNMF builds local patchfor each sample by itself and few limited nearest neighbors and preserves the localgeometric structure in the part optimization, but it completely ignores the label in-formation. Although DNMF incorporates the discriminative information, the builtpatches are global and thus DNMF performs not well especially when data does notobey the Gaussian distribution. We proposed non-negative discriminative localityalignment to overcome the aforementioned drawbacks. NDLA builds two types ofpatchesforeachsample,i.e.,inner-classpatchandbetween-classpatch,whereintheinner-class patch composes of itself and its limited nearest neighbors form the same class as the given sample and the between-class patch composes of itself and itslimited nearest neighbors from the different classes against the given sample. Thepartoptimizationontheinner-classpatchminimumsthedistancesbetweensamplesin identical class to preserve the local geometric structure while the part optimiza-tion on the between-class patch maximums the margins between different classes toincorporate the discriminative information. We alignment both part optimizationsby using the alignment strategy and arrived at the objective of NDLA by combiningthem together. Note that NDLA can be optimized by using the proposed MUR forNPAF.3. Fast gradient descent for NPAF To overcome the slow convergence deficiency ofMUR, we revisited the proposed MUR from the viewpoint of gradient descent andproposed a fast gradient descent method for NPAF. FGD searches the optimal stepsize along the rescaled negative gradient by using the Newton method and updatesthe matrix factor without exceeding the positive orthant. As FGD uses the optimalstep size, it greatly accelerates the convergence of MUR. By using the Jesen’s in-equality, we proved the convergence of FGD. However, the optimal stepsize mayreduce to1in some cases which makes FGD degenerate to MUR. To overcomessuch deficiency, we set one stepsize for each column (or row) of matrix factor andproposed a multiple stepsize FGD (MFGD). In MFGD, the optimal step size vectoris obtained by using the multivariate Newton method. To reduce the computationalcomplexity of MFGD, we advanced the step size setting strategy and proposed bal-anced multiple stepsize FGD (BFGD). Note that the convergences of both MFGDand BFGD can be proved by using the Jesen’s inequality.4. OptimalgradientmethodforNPAFByprovingtheconvexityofeachsub-problemof NPAF, we applied the optimal gradient method to alternatively optimizing thematrix factors. The NMF optimization algorithms have received much attention.There exist many methods such as multiplicative update rule, projected gradien-t, quasi-Newton, and active set method. However, they suffer from one or morefollowing drawbacks:(1) slow convergence;(2) high computational complexity;and (3) numerical instability. Therefore, how to efficiently optimize NMF is stil-l an open problem. The thesis proves that each sub-problem of NMF is convexand the gradient is Lipschitz continuous, and thus the optimal gradient method can be applied to optimize each matrix factor at the optimal convergence rate O(1/k2)without line search. Similarly, by proving the convexities of both sub-problems ofNPAF, we proposed an efficient NPAF optimization algorithm based on OGM.5. OnlineRSAalgorithmforNMFSincethespacecomplexityofthetraditionalNM-F optimization algorithms is proportional to the sample dimensionality and samplenumbers. It prohibits NMF on the streaming dataset. In addition, the traditional N-MFalgorithmsshouldberestartedonthearrivalofnewsampleorchunkofsampleswhich brings in increasing huge time overheads. Therefore, someone proposed on-line NMF algorithms which update both matrix factors on the arrival of new sampleor chunk of samples. However, the existing online NMF algorithms suffer from nu-merical instability problem caused by noise of data and rank-deficiencies of matrixfactors. Thethesisproposedanewonlineoptimizationalgorithmwhichupdatesthebasis matrix by using robust stochastic approximation on the arrival of new sam-ple or chunk of samples. The RSA algorithm converges at the rate of O(1/√k) inmost cases and greatly enhances the robustness of online NMF. By using the theoryof quasi-martingale, we proved the convergence of the proposed online optimiza-tion algorithms. To overcome the high space complexity problem, we introduced abuffer to store limited historical samples and advanced the RSA based algorithm.It replaces the oldest sample to incorporate enough statistical information into thebasis matrix on the arrival of new sample.

  • 【分类号】TP301.6;O151.21
  • 【被引频次】1
  • 【下载频次】80
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络