

t-mixture Models and Extended Locality Preserving Projections for Clustering and Dimensionality Reduction

【作者】 陈思宝

【导师】 罗斌;

【作者基本信息】 安徽大学 , 计算机应用技术, 2006, 博士

【摘要】 模式识别作为一门多领域交叉的学科,在近几十年得到了蓬勃发展。它不仅得到了众多科研人员的热情研究,而且受到了各级政府和组织的重视。世界上许多国家和地区的国防、公共安全部门以及工业界都积极投资模式识别技术的研究。其发展将对科技进步、国防、公共安全、工业制造、人民生活等水平产生深远影响。 本文以统计理论和图谱方法为基础,重点研究模式识别中的聚类和降维这两方面的内容:(ⅰ)在聚类中,结合图像分割,对有限混合模型的参数估计方法进行了较为深入的研究:(ⅱ)在降维中,结合图像识别,研究了保局投影及其二维和线性混合的扩展。本文研究的主要内容及创新如下: 研究多元t-密度的有限混合模型的参数估计方法,构建了多元t-混合模型的SMEM算法。t-密度尾部较重,抗噪性能好,是替代高斯密度的标准选择。EM算法是求解混合密度的参数估计的常用算法。而常规的EM算法经常收敛到局部最优而非全局最优。我们采用了把分量进行分裂合并,使参数跳出局部最优来寻找全局最优的思想,构建了多元t-混合模型的SMEM算法,并且提出了一个基于样本均值和方差的分裂合并准则。实验验证了我们的算法性能良好。 根据局部Kullback散度,构建了多元t-混合模型的贪婪EM算法的框架。从一个混合分量开始,逐个地分裂拟合最差的分量并用EM算法修正各分量的参数。相对于SMEM算法,贪婪EM算法具有它的优势:易于参数初始化、速度快且性能相当、产生的混合模型序列便于模型选择。实验验证了贪婪EM算法的速度快且性能和SMEM算法相当。 研究保局投影(LPP)及其低样本量时的解法,提出LPP/QR算法。LPP能够保持数据的局部信息且能发现数据流形的内在结构。但对小样本情形,矩阵存在奇异性,LPP无法直接使用。我们提出基于QR分解的保局投影(LPP/QR)

【Abstract】 Pattern recognition, as an interdisciplinary subject, has got great development in the past few decades. It has become not only the pursuit of researchers but also the interests of governments and organizations. The National Defenses, departments of public safety and industrial communities of many countries/ regions have invested a large amount of money on the research of pattern recognition techniques. The development of pattern recognition will greatly influence the progress of science and technology, national defence, public safety, industrial manufacture and the life of people.Based on the statistics theory and graph spectrum methods, this disserta,-tion mainly investigates these two aspects in pattern recognition — clustering and dimensionality reduction: (i) the in-depth study of estimation techniques of parameters of t-mixture models in clustering, where we take image segmentation into account; and (ii) the research on locality preserving projections and its extension of two-dimensional methods and that of linear mixtures in dimensionality reduction, during which we take image recognition into account. The main contributions of this dissertation are outlined as follows.Firstly, we investigate the estimation techniques of parameters of multivari-ate t-density mixtures and construct SMEM algorithm for them, t-density has heavier tail with good property of anti-noising. Modeling mixtures of multivariate t-densities is usually adopted as a standard and robust alternative to Gaussian mixtures. Expectation-maximization (EM) algorithm is a standard algorithm for solving the parameters of mixture models. However, EM often converges to local optimum rather than global one. We take the idea of allowing the parameters to jump out of local optimum and looking for global one by means of splitting

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2006年 12期
  • 【分类号】TP391.4;TP18
  • 【被引频次】7
  • 【下载频次】348
  • 攻读期成果

