节点文献

K-均值聚类算法的研究与改进

Research and Improvement on K-Means Clustering Algorithm

【作者】 欧陈委

【导师】 陈曦;

【作者基本信息】 长沙理工大学 , 计算机应用技术, 2011, 硕士

【摘要】 随着计算机技术的飞快发展,人们每天都会面临诸如文本、图像、音频、视频等各种形式的数据,这些数据的数量是极其庞大的,如何快速有效地从这些海量数据中提炼出其间所隐含的有价值的信息,成为人们十分关注且亟待解决的问题。数据挖掘(Data Mining,DM)由此而诞生。它为人们解决这个问题提供了许多卓有成效的方法和工具。聚类分析就是其中最为重要的方法之一,它是数据挖掘技术的重要组成部分。随着近年来对聚类分析技术的研究逐渐深入,其重要性已越来越得到人们的认可。近年来,无论在理论方面还是在实际应用方面,聚类分析技术的研究都取得了丰硕的成果。目前,聚类分析技术已在机器学习、模式识别、图像处理、文本分类、市场营销及统计科学等领域得到了广泛的应用。根据数据类型、聚类目的及应用的不同,目前已有的聚类算法大致可以分为以下几种:划分的算法、层次的算法、基于网格的算法、基于密度的算法以及基于模型的算法。其中,研究最为成熟最为经典的就是基于划分的K-均值聚类算法。本文深入研究和分析了K-均值聚类算法的优缺点,并针对其聚类结果易受初始中心影响的特点,对K-均值聚类算法进行了改进。本文所做的主要工作有:1.针对K-均值聚类算法对初始聚类中心存在依赖性的缺陷,本文提出一种新的选取K-均值聚类算法初始聚类中心的方法,实验表明,该方法可有效解决由于初始聚类中心选取的过于邻近而导致聚类结果不稳定的问题,提高了聚类结果的有效性和稳定性。2.针对K-均值聚类算法存在对初始中心的选择敏感且易陷入局部最优解的缺点,本文将全局寻优能力强的差分进化算法引入聚类中。本文提出了一种改进的差分进化算法,并将改进的差分进化算法和K-均值聚类算法相结合,较好地解决了K-均值聚类算法初始中心的优化问题,实验表明,该方法有效提高了聚类质量和收敛速度。

【Abstract】 With the rapid development of computer technology, people face all kinds of data, such as text data, image data, audio data, video data and so on. The quantity of these kinds of data is very large. How to quickly and effectively gain implicit and valuable information from these mass data has been a problem that has got much attention and should been solved urgently. Data mining (DM) has appeared in this situation. It has provided lots of efficient methods and tools on solving that problem for people. The Clustering analysis is one important method of them. It is an important part of data mining. With the gradually intensive research on clustering analysis these years, its importance has been recognized by people more and more. Clustering analysis technology has gained plentiful and substantial achievements in both theory and practice during recent years. At present, clustering analysis has been widely applied in machine learning, pattern recognition, image processing, text classification, marketing, statistical science and lots of others fields.According to the difference of data type, clustering purpose and application, we can divide existing clustering algorithms into partition algorithm, hierarchical algorithm, grid-based algorithm, density-based algorithm and model-based algorithm. One of the most mature and classical clustering algorithms is k-means clustering algorithm. It is a partition algorithm. This paper presents deeply research and analysis on merits and defects of k-means clustering algorithm. This paper has provided a improvement on k-means clustering algorithm according to the feature that the results of k-means clustering algorithm liable to be effected by initial centers. Following are the main works have been done:1. According to the defect that K-means clustering algorithm is dependent on the initial clustering centers selection, this paper put forward a new initial clustering centers selection method of k-means algorithm. The experiments showed that this method has effectively solved the problem that the clustering result is always unstable due to the initial clustering centers overly close to each other and has improved effectiveness and stability of the clustering result.2. Aiming to the disadvantages of k-means clustering algorithm that it is sensitive to the initial centers selection and easily falls into local optimal solution, differential evolution algorithm whose global optimization ability is strong was introduced into clustering in this paper. This paper put forward an improved differential evolution algorithm and made it combined with k-means clustering algorithm at the same time. This method has solved initial centers optimization problem of k-means clustering algorithm well. The experiments showed that the method has effectively improved clustering quality and convergence speed.

  • 【分类号】TP311.13
  • 【被引频次】30
  • 【下载频次】1425
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络