

An Improved SOM Algorithm and Its Application in Clustering

【作者】 郑思平

【导师】 肖人岳;

【作者基本信息】 华南理工大学 , 计算数学, 2010, 硕士

【摘要】 聚类技术是一种无监督的学习过程,它的主要目的是把无标示的数据分成具有相同特点的组。主要包括划分的方法,层次的方法,基于密度的方法,基于网格的方法,基于模型的方法。基于模型的方法,SOM(Self-Organizing Maps)神经网络是一种典型的无导师聚类算法,它是1981年由芬兰学者Kohonen提出的自组织特征神经网络模型,以其所具有的诸如拓扑结构保持、概率分布保持、无导师学习及可视化等特征,被广泛应用于众多信息处理领域,可用于语言识别,图像压缩,机器人控制,优化问题控制理论、金融分析、实验物理学、化学、医药等。本文主要研究了自组织神经网络算法,针对当前动态SOM仍然存在两个缺陷,生成新的神经元受网络结构限制和生成神经元要预先给个阀值,并提出了一种改进的GSOM算法,并对某研究员检测的细胞各个指标数据进行自动聚类分类,通过与传统SOM算法比较,验证算法的高效性。改进算法主要有以下创新点1)不需要在实验前设定神经元数目,完全自组织无导师学习,自动聚类;2)基于方差分析思想生长,不需要凭经验或者另外计算合适的控制生长因子SF;3)修剪过程,专门排除噪声异常数据;4)圆形网络结构,不存在生成神经元无法安排问题,可以适应自由生长。

【Abstract】 Clustering is an unsupervised learning process,its main purpose is to no markedcharacteristics of the data into the same group.Mainly divided into the method, hierarchicalmethods,density-based method,grid-based approach model-based approach.model-basedapproach,SOM(Self-Organizing Maps)neural network is a typical unsupervised clusteringalgorithm,which is 1981,made by the Finnish scholar Kohonen self-organizing featureneural network model,such as they have topology to keep,maintain a probabilitydistribution, unsupervised learning and visualization features,are widely used in many areasof information processing can be used for speech recognition,image compression, robotcontrol,optimization control theory,financial analysis,experimental physicsscience,chemistry,medicine.This paper studies the self-organizing neural network algorithm for both the currentshortcomings remain GSOM to generate new neurons generated by the network constraintsand neuron to advance to a threshold,and an improved dynamic SOM algorithm,and aFellow of various indicators of cell testing data automatically cluster classification,compared with the traditional SOM algorithm to verify the algorithm efficiency.Improvedalgorithm has the following main innovations 1) do not need to set the number of neuronsbefore the experiment,completely self-organizing unsupervised learning,automaticclustering;2) analysis of variance based on the growth of ideas,no rule of thumb,or othersuitable control the growth of computing factor SF; 3) pruning process,specifically excludeabnormal noise data;4) circular network structure,there is no problem can not be arrangedgenerated neurons can adapt to free growth;The algorithm matlab programming tools,toachieve a classification and comparison of the clustering process.

【关键词】 GSOM方差分析圆形邻域聚类
【Key words】 GSOMvariance analysiscircular neighbourhoodcluster