节点文献

基于多元数据子空间坐标图表示的可视化模式识别

Visual Pattern Recognition Based on Subspace Coordinates Graphical Representation of Multivariate Data

【作者】 徐永红

【导师】 洪文学;

【作者基本信息】 燕山大学 , 测试计量技术及仪器, 2010, 博士

【摘要】 模式识别是人类以及其他一些高级动物赖以生存的基本智能之一。大多数情况下,人都有很好的模式识别能力,这种能力被视为是自然的事情,可是让机器处理同样的模式识别问题时往往会遇到更大的困难。尽管已有几十年的研究历史,直到今天还是不能很好理解人类是如何识别模式的。虽然计算机模式识别理论和方法已经获得了充分的研究和巨大的进步,但仍然存在一些根深蒂固的问题,例如著名的小样本问题、维数灾难问题、黑匣子问题等。长期以来,全自动化始终是模式识别系统的设计目标之一,识别过程中的人工参与被降低到最小。虽然在分类器设计阶段,也用到一些数据探索性分析和可视化方法,但是这些可视化方法并没有真正纳入模式识别流程,往往只是对原始数据或者结果进行简单的可视化。多元数据可视化作为数据分析的一种重要方法,已经在许多领域获得广泛应用。但目前对各种多元可视化技术的相互联系研究得不多,各种图表示方法缺乏统一的理论基础。要将多元数据的多元图表示方法和机器算法集成以实现可视化模式识别,尚需解决一些基础性的问题。本文研究主要围绕三个基本问题展开:如何建立几种常用多元图表示方法的统一描述模型?如何对传统多元图表示方法进行优化以使其更适合模式识别应用?如何将机器算法和图表示方法集成以实现可视化分类?本文首先研究了几种多元图表示方法的表示原理和特性,在此基础上给出了多元数据子空间坐标图表示的一般模型,该模型将散点图、散点图矩阵、平行列线图、平行坐标、三角多项式和雷达图等统一到同一个表示框架,从而不仅有助于研究这些图表示方法的区别与联系,还有助于研究和发展新的图表示方法。接着,本文定义了二维对偶坐标映射,研究了二维对偶坐标的表示特性并证明了相关定理,在此基础上提出了一种多元数据可视化新方法——多元平行对偶图。该方法在同一个视图中将多个散点图和平行坐标有机集成,同一样本的对偶坐标表示和平行坐标表示具有确定的几何关系,可以根据需要在这两种形式间切换,从而综合利用两种方法的优点而弥补其不足。本文还研究了二维对偶坐标的三维显示以及多元数据的三维对偶坐标表示,并给出了表示的示例。本文最后研究了多元图表示的图形特征优化问题,提出了基于凸壳的平行坐标优化、基于复线性判别分析的星座图权系数优化和Radviz快速优化方法。并且将机器学习算法和平行坐标相结合,提出了三种基于优化平行坐标的可视化分类器:可视化BP神经网络、平行筛可视化分类器和贝叶斯可视化分类器,并针对蔬菜油分类、故障诊断、疾病诊断等某些领域问题进行了实验研究。研究结果表明,本文提出的可视化模式识别方法具有模式可视化(使看不到的看得到)、复杂系统表示简单化和有利于专家知识的利用和生成等特点。有望进一步发展和完善该方法,并将其应用于某些领域的复杂模式识别问题。

【Abstract】 Pattern recognition is one of the basic intelligence of human and other senior animals. Human have excellent pattern recognition capability in most cases, and this capability is considered as a nature. However, teaching machine to deal with the same pattern recognition problem is not so easy. After a long research time of several decades, up to this day the mechanism of human pattern recognition is not well grasped. Although the theories and methods of automatic pattern recognition by computers have been fully studied and great successes have been made, there are some well known open problems such as small samples problem, dimension curse, black-box problem and so on. Fully automation is still one of the design criteria of pattern recognition system, the interactions of human and machine are reduced to the least. Although in the stage of designing classifier, some techniques of exploratory data analysis and visualization are used occasionally, these methods are not combined with the pattern recognition algorithms compactly. Usually only the original data or classification results are visualized.As an important way of data analysis, multivariate data visualization techniques have been applied in many domains. Up to now, the relationship of these multivariate data visualization techniques has not been fully researched. A united theoretical basis of various graphical representation methods is still not found. In order to realize visual pattern recognition by integration of multivariate graphical representation methods and machine algorithms, there are some basic problems to be solved. Work of this thesis focuses on three basic problems: How to construct a describing model of several popular multivariate graphical representation methods? How to optimize these multivariate graphical representation methods for pattern recognition application? How to integrate the machine algorithms and multivariate graphical representation methods for visual classification?Firstly, the representation principles and characteristics of several popular multivariate graphical representation methods are investigated. And then a general graphical representation model of multivariate data subspace coordinates is presented. This model united the scatter plot, scatter plot matrix, nomogram, parallel coordinates, Andrews’plot and star glyph to the same representation framework, so as to facilitate not only researches on the differences and relationships of these methods but also the development of new graphical representation methods.Secondly, 2D dual coordinates is defined,the representation characteristics are studied and several theorems are proved. Consequently a new multivariate visualization method named multivariate parallel dual plot is developed. This method integrates multiple scatter plots with the parallel coordinates, moreover the dual coordinates representation and parallel coordinates representation of the same sample has determined geometrical relationship. The two representation forms can be switched according to actual needing, consequently combing the merits of both methods and overcome their shortcomings. The three dimensional display of 2D dual coordinates and 3D dual coordinates representation are also investigated and representation examples are provided.Lastly, the problem of graphical features optimization is studied. The optimization of parallel coordinates by convex hull, the weights optimization of constellation graph by complex linear discriminant analysis and the rapid optimization of Radviz are proposed. Some machine learning algorithms are combined with parallel coordinates, and three visual classifiers based on the optimization of parallel coordinates are proposed: the visual BP neural network, the parallel filter visual classifier and the Bayes visual classifier. Some experiments are done using data sets such as vegetable oil classification, fault diagnostics and disease diagnostics.This research indicate these visual pattern recognition methods have the merits of pattern visualization (making the invisible visible), making the representation of complex system simple and facilitating the utilizing and generating of expert knowledge. It is expected to develop this method further and apply it to some domains’complex pattern recognition problems.

  • 【网络出版投稿人】 燕山大学
  • 【网络出版年期】2010年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络