节点文献

基于计算智能技术的聚类分析研究与应用

Research of Cluster Analysis Based on Computational Intelligence with Applications

【作者】 张建萍

【导师】 刘希玉;

【作者基本信息】 山东师范大学 , 信息管理与电子商务, 2014, 博士

【摘要】 聚类属于无监督学习,是将数据集中的数据对象分成多个簇或者类,使得在同一个簇中对象相似度高,而在不同簇中对象的相似度低,因此,对空间数据对象的聚类可通过基于聚类目标函数的优化问题来解决。从这一思路出发,将自适应能力及鲁棒性较高的计算智能技术应用于聚类分析,产生了很多基于计算智能技术的聚类分析模型。基于计算智能的聚类分析成功解决了数据的聚类问题,对处理目标的特性有良好的适应能力,弥补了传统聚类方法的不足,取得了良好的效果。计算智能方法主要包括神经网络、模糊控制、进化计算、混沌科学、免疫计算、DNA计算及群体智能等。近年来,神经网络、模糊逻辑和进化计算三个方向的研究成为热点。自组织映射(SOM)是最有代表性的神经网络聚类方法;遗传算法、进化策略、免疫规划、克隆学说、蚁群系统、微粒群优化、文化算法等进化计算已成功应用到聚类分析中;另外,在传统聚类分析中引入模糊集概念,产生了模糊聚类算法;根据计算智能技术的优缺点,将一些计算智能方法融合起来应用于聚类分析,提高了聚类的能力。论文将神经网络、遗传算法等计算智能技术用于聚类分析,构造聚类分析模型,研究该模型的定义及优化方法的特点和不足,改进或提出相应的解决方法;另外,针对模型在聚类分析中的应用研究并结合离散Morse的相关理论和方法,研究离散Morse理论在聚类分析中实现的关键技术和方法,并提出基于Morse理论的聚类分析模型以适应具体应用的要求。通过实验,验证了模型的有效性和可行性。本文的主要研究内容如下:1.针对传统SOM网络模型用于聚类分析时竞争层神经元个数须预先指定的缺点,给出了在训练过程中动态确定网络结构和单元数目的解决方案,提出一种新的动态自组织特征映射模型,并给出模型的训练算法。此算法初始只有一个根结点。在网络训练过程中不断产生新结点。新的结点可在任意位置根据需要自动生成。当训练算法结束时,根据得到的树形结构确定聚类的数目。算法中通过扩展因子控制网络的生长,实现了不同层次的聚类。算法采用两阶段的训练思想。当算法的生长阶段完成后,利用模糊C-聚类的思想,对生长阶段产生的粗聚类结果做细化处理,从而提高最终聚类结果的精度和算法的收敛速度。通过UCI数据集来验证该模型的有效性和优越性,并对其聚类的有效性进行对比分析。2.介绍了谱聚类技术及相关概念,对谱聚类算法进行研究及分析,提出一种自动确定聚类数目的谱聚类算法。为了解决CLARANS算法易收敛于局部最优及面对大数据集聚类效率不高的问题,结合遗传算法易于找到全局最优值的特点,将遗传算法和CLARANS算法相结合,提出基于GA的聚类分析模型,并通过选择合适的适应值函数,达到聚类的目的。通过实验证明了新算法的的优越性3.介绍了离散Morse理论的基本原理及相关概念,提出一种构建离散Morse函数求最优解的算法,并证明了构建的函数是最优的离散Morse函数,同时构建了一种基于离散Morse理论的优化模型,实验的结果证明了该模型的有效性。这是一个全新的尝试。4.把基于离散Morse理论的优化模型应用于聚类分析,提出一种基于离散Morse优化模型的密度聚类算法。聚类后的结果运用层次聚类的思想进行优化,可以通过参数的调整来控制聚类簇的数目,达到聚类效果。实验证明新算法的可行性及有效性。本文的创新点总结如下:1.提出一种新的动态SOM模型。该模型采用新的生长阈值函数,训练算法采用两阶段思想。实验在UCI数据集上进行,通过与SOM模型、FCM算法及TreeGNG对比验证了该模型的有效性和优越性。2.提出一种基于GA的自动谱聚类算法GA-ISC。通过改进的谱聚类算法ISC-CLARANS达到自动产生聚类结果的目的。引入GA提高CLARANS算法的执行效率。实验分别在人工数据集及UCI数据集上进行。实验证明ISC-CLARANS算法正确、有效。通过GA-ISC与ISC-CLARANS算法的聚类结果比较,验证了GA-ISC算法的高效性。3.提出一种基于离散Morse理论的优化模型,该模型通过在单纯复形上构造离散Morse函数来实现。实验结果证明了该模型的正确性及有效性。4.提出一种新的基于离散Morse优化的聚类模型。该模型在离散曲面上进行。聚类后的结果运用层次聚类的思想进行优化。实验在人工数据集及UCI数据集上进行,通过与DBSCAN算法的聚类结果比较,验证了新模型的高效性及优越性。

【Abstract】 Data clustering is applied to finding natural clusters in data. It belongs tounsupervised learning methods. The objective of clustering is that a cluster is agroup of objects that are more similar to one another than to members of otherclusters. We can transform a clustering problem into optimization problem based onthe objective function. Many algorithms based on Computational Intelligence havebeen proposed. The cluster analysis methods based on Computational Intelligence(CI) can effectively solve the problem of data clustering, make up for the deficiencyof the traditional clustering methods, and have achieved good results.CI techniques include artificial neural network,fuzzy control, evolutionarycomputation, immunization algorithm, DNA computation, swarm intelligence, etc.Artificial neural network, fuzzy computation and evolutionary computation havebecome the fore in recent years in the computation intelligence field. SOM is therepresentative among neural network clustering methods. Genetic algorithm,culturealgorithm,evolutionary strategy, immune programming, clone theory, particle swarmoptimization are successfully applied into cluster analysis. Fuzzy clusteringalgorithm based on the concept of fuzzy is an effective method for data clustering.Moreover, Computational intelligence methods mentioned above are combined toenhance the clustering performance.In this paper, neural network and GA are chosed as a tool of clustering. Clusteranalysis models are present. The corresponding improvement solutions are putforward according to deficiency of optimization methods and the definition of themodel. Furthermore, discrete Morse theory is combined with cluster analysis bystudying the key technique and method, then a new clustering algorithm based ondiscrete Morse theory is proposed. The experimental results show the feasibility andeffectiveness of the new clustering algorithm.The main research contents of this dissertation are summarized in thefollowing:(1)For overcoming the shortcoming that the output layer neuron number mustbe pre-specified in classical SOM network in cluster analysis, a new dynamic SOMclustering model is proposed, then the training algorithm of the model is given.Theclustering model can dynamically determine the network structure and the number ofneurons in the process of training. There is one root node at the beginning of thealgorithm, then new nodes can dynamically produce anywhere. The clusters number will automatically be determined with the establishment of tree structure. Thealgorithm of new SOM clustering network adopts an spread factor, which can beused to measure and control the network growth and spread. The spread factor is alsoproposed as a method of obtaining hierarchical clustering of a data set with the newclustering model.2-phase training is adopted in the algorithm. Fuzzy C-meansalgorithm is applied to refinement of the clustering result which is the general resultin phase1when finishing the network growth. By this way, the clusters is producedautomatically, the convergence speed of the algorithm and the accuracy of the finalclustering results have been greatly improved. The experimental results on UCI datasets have demonstrated that the new algorithm is feasible and effective. Thecomparative analysis is used to prove the effectiveness of the new SOM clusternetwork and the improved algorithm.(2) Theory basis of spectral clustering is introduced, which includes:introductions to spectral clustering definitions, technology and algorithms. Then anew algorithm of spectral clustering is proposed which can automaticallydetermine the number of clusters in a dataset. In order to solve convergence to localoptimum and inefficiency of clustering of CLARANS algorithm in large data sets,spectral clustering model based on GA is present because genetic algorithm is easyto find the global optimal value. The experimental results show the effectiveness ofthe new clustering algorithm.(3) Fundamentals and concepts of discrete Morse theory are introduced. Then anew algorithm of constructing Morse function for optimization is proposed. Morsefunction is proved to be a optimal function. The experimental results show theeffectiveness of the new model. This is a novel attempt.(4) The optimization model based on discrete Morse theory is successfullyapplied to cluster analysis. then a new cluster framework based on discrete Morseoptimization Model is proposed. is the hierarchical clustering is adopted tooptimization of clustering results. The experimental results show the feasibility andthe effectiveness of the new model.Innovative viewpoints of this dissertation are summarized in the following:(1) A new SOM clustering model is proposed. A new growth threshold isapplied to the model.2-phase training is adopted in the algorithm. The experimentalresults on UCI data sets have demonstrated that the new algorithm is feasible andeffective. The comparisons with classical SOM method, FCM algorithm andTreeGNG method further are used to prove the effectiveness of the new SOMnetwork.(2) a new algorithm of spectral clustering based on GA(GA-ISC) is proposed.An improved spectral clustering algorithm(ISC-CLARANS) can automaticallydetermine the number of clusters in a dataset. The experimental results on somesynthetic and UCI data sets have demonstrated that ISC-CLARANS algorithm is feasible. The comparison with the ISC-CLARANS algorithm is used to prove theeffectiveness of the GA-ISC network.(3) A discrete Morse optimization model is present by constructing Morsefunction on a simplicial complex.The experimental results show the effectiveness ofthe new model.(4) A new clustering algorithm based on discrete Morse optimization model isproposed in this paper. The hierarchical clustering is adopted to optimization ofclustering results. The experimental were carried out on some synthetic and UCI datasets.ed that the new model is feasible and effective. The comparisons with DBSCANalgorithm show the feasibility and the effectiveness of the new model.

  • 【分类号】TP18;TP311.13
  • 【被引频次】3
  • 【下载频次】1805
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络