节点文献

复杂仿真数据的降维与可视化聚类方法研究

Research on Methods of Complex Simulation Data Dimension Reduction and Visualization Clustering

【作者】 李惠君

【导师】 王子才; 李志全;

【作者基本信息】 燕山大学 , 控制科学与工程, 2013, 博士

【摘要】 随着科学技术的发展,仿真系统的复杂程度越来越高,随之仿真数据也出现高维、数据量激增、包含随机性和人为性等不确定因素等特点,经典统计理论在分析这些数据时暴露出了一系列问题。随着计算机硬件技术的发展和数据挖掘理论的兴起,基于数据挖掘技术的复杂仿真数据分析逐渐进入了研究人员的视野,论文基于可视化数据挖掘技术,对大规模、高维数、相互关系复杂的仿真数据的可视化聚类及相关问题进行了研究,具有一定的理论和工程意义。针对专家估计法对复杂仿真数据可视化前的特征选择,可能造成忽视专家个人差异及数据自身特点的问题,提出了基于模糊综合评价模型的主客观估计法。首先构造专家模糊评判矩阵,并根据专家在行业的影响力确定权重,进行主观模糊综合评价;然后根据数据自身特点计算属性的信息熵,获得客观评价;最后将主观评价和客观评价按照不同比例进行综合,从而确定属性的重要程度。针对复杂仿真数据可视化前的数据降维问题,分析了常用的流形学习降维方法;证明了局部切空间排列算法(LTSA)与核主分分析方法(KPCA)本质上的一致性;提出了基于核的LTSA算法对增量仿真数据降维的改进。经实验验证,该改进算法与LTSA算法相比能达到同样的降维效果,并且具有更高的运行效率。针对复杂仿真数据降维中需事先提供维数的问题,采用改进的极大似然估计法进行本征维数估计。首先分析了极大似然法存在的缺点,提出利用测地线距离代替欧式距离的方法,来解决错误近邻点选择问题;提出对各局部估计的本征维数以密度修正代替平均值的方法,来解决估计结果受奇异值影响过大的问题。针对复杂仿真数据可视化聚类问题进行了研究,提出两种可视化聚类方法。在基于改进雷达图的可视化聚类方法研究中,首先对传统雷达图进行了改进,为突出数据特征,以属性权重确定极角,以属性值确定极径;又对k-means算法中存在的随机确定初始中心点而无法得到最优解问题,提出了优化初始中心点算法;针对算法必须事先给出聚类个数,而实际难以做到的问题,提出采用循环和专家监督干预的改进方法。在基于自组织映射的可视化聚类方法研究中,将传统的矩形或六角形方格中的神经元映射改变为雷达图映射,解决了传统SOM映射中无法反映数据点差距的问题;通过增加横向收缩力,重构权向量,加速了映射点的收敛时间;提出利用随获胜神经元到邻域神经元间距单调递减的函数作为修正值的自适应学习速度改进,来增加算法的稳定性和收敛时间。经实验验证,该算法具有更高的效率和鲁棒性。论文丰富了高维数据降维、可视化数据挖掘的方法,为复杂仿真数据分析方法提供了新的技术支持。

【Abstract】 The simulation systems are more and more complicated with the development ofscience and technology. The simulation data also appears high-volume,high-dimension,and many uncertain characteristics such as randomness, artificiality and so on. The dataanalysis applying classical statistical theories reveals a series of problems. With thedevelopment of computer hardware technology and data mining, complex simulation dataanalysis based on data mining technology gradually enters the sight of researchers. Thispaper studies visual clustering and related issues of the simulation data with large-scale,high-dimension, and the complex relationship. It is based on visual data miningtechnology. The research has a certain theoretical and engineering value.It often has to select feature before the complex simulation data visualization. Theexpert personal differences and data characteristics may be ignored in the traditionalexpert estimate method, the subjective and objective estimation method based on fuzzycomprehensive evaluation model is proposed because. First, the expert fuzzy evaluationmatrix is constructed, and expert right weights are determined according to the expertsinfluence in the industry. It is the subjective fuzzy comprehensive evaluation. Then theattribute information entropy is calculated according to the data nature features. Finally,the subjective evaluation and objective information entropy are integrated in differentproportions, thus the degree of importance of the attributes are determined.One of the important issues is the dimension reduction for visualization data miningof complex simulation data. First the main manifold learning dimension reductionmethod are analyzed in detail; Then both local tangent space alignment algorithm (LTSA)and kernel principal component analysis (KPCA) are deduced and proved consistencyessentially from mathematics; Finally an improved LTSA algorithm based on kernel forthe incremental simulation data is proposed. The experiments confirm the improvedLTSA algorithm achieve the same effects for dimension reduction as the LTSA algorithm,and the former has a higher efficiency than the latter.The dimension reduction of complex simulation data needs give the intrinsic dimension in advance. Aiming at this problem, an improved maximum likelihoodestimation of the intrinsic dimension estimation is suggested in this paper. Theshortcomings of the maximum likelihood method are analyzed, geodesic distance is used,instead of Euclidean distance, to solve the nearest neighbor selection errors; In order toavoid influencing the estimation result too much by the singular value, the average ofevery local intrinsic dimension is replaced by density correction value.Two novel methods are proposed in the study of the visualization clustering of complexsimulation data. In the visualization clustering method based on the improved radar chart,the traditional radar chart is improved to highlight the characteristics of the data, whichattribute weights determine polar angles, attribute values determine polar radius. k-meansalgorithm randomly select initial centers and can not get the optimal solution, and themethod of optimized initial centers is given. And the algorithm needs to be given thenumber of clusters in advance, but it is actually very difficult, an improved method usingof cyclic and expert supervision is put forward. In the visualization clustering methodbased on self-organizing map(SOM), nerve in the traditional rectangular or hexagonalgrid element mapping is changed into the radar chart mapping to solve the traditionalSOM can not reflect the real disparities between the data points; The algorithm accelerateconvergence by increasing the lateral contraction force and then reconstructing theweighted vector; proposed to winning neuron to the neighborhood neurons pitch relatedmonotonically decreasing function as a correction value of the adaptive learning speedimprovement, to increase the stability of the algorithm and convergence ti Two novelmethods are proposed in the study of the visualization clustering of complex simulationdata. In the visualization clustering method based on the improved radar chart, thetraditional radar chart is improved to highlight the characteristics of the data, whichattribute weights determine polar angles, attribute values determine polar radius. k-meansalgorithm randomly select initial centers and can not get the optimal solution, and themethod of optimized initial centers is given. And the algorithm needs to be given thenumber of clusters in advance, but it is actually very difficult, an improved method usingof cyclic and expert supervision is put forward. In the visualization clustering methodbased on self-organizing map(SOM), nerve in the traditional rectangular or hexagonal grid element mapping is changed into the radar chart mapping to solve the traditionalSOM can not reflect the real disparities between the data points; The algorithm accelerateconvergence by increasing the lateral contraction force and reconstructing the weightedvector; The monotonically decreasing function with the distances between winningneuron to neighborhood neuron was induced to adaptively correct the learning speed, andthen to increase the stability and accelerate convergence. The experiments prove that thealgorithm has higher efficiency and robustness.The paper enriches dimension reduction of high-dimensional data, visualization datamining method, and provides the technical support for complex simulation data analysismethods.

  • 【网络出版投稿人】 燕山大学
  • 【网络出版年期】2014年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络