节点文献

基于FCM聚类的算法改进

Improvement Based on FCM Clustering Algorithm

【作者】 宁绍芬

【导师】 姬光荣;

【作者基本信息】 中国海洋大学 , 通信与信息系统, 2007, 硕士

【摘要】 聚类分析是数据挖掘中的一个重要研究领域,是一种数据划分或分组处理的重要手段和方法。聚类的应用是非常广泛的,无论是在商务领域,还是在生物学、Web文档分类、图像处理等其它领域,都得到了有效的应用。目前聚类算法大体上分为基于图论的方法、基于层次的方法、基于密度的方法、基于网格的方法、基于模型的方法和基于划分的方法。模糊C-均值(FCM)聚类算法是非监督模式识别中应用最为广泛的算法之一。由于该算法是通过极小化目标函数而求得最优解的。该算法随机选取C(C为聚类数)个点作为初始聚类中心,通过一个迭代过程完成聚类。该算法也有它固有的不足:算法在进行聚类以前要求知道C值,这对于没有经验的用户来说很困难;初始聚类中心的选择对于最后的聚类结果有很大的影响,如果初始聚类中心选择不当,目标函数有可能得不到全局最优,而陷入局部极小值。此文首先对常用的几种聚类算法分别进行了介绍,并举例说明。然后重点讨论了基于FCM聚类的算法改进。试图从几个方面对FCM聚类进行改进:C的选择;初始聚类中心的选取;用类核代替类心;修改距离测度函数以及修改隶属度m的值。实验采用聚类中常用的IRIS数据集来测试改进算法,并且和标准FCM算法进行了比较,证实了该算法的有效性。最后简单讨论了FCM聚类在海雾识别中的应用。

【Abstract】 Clustering is an important area of application for a variety of fields including data mining. It is also an important method of data partition or grouping. Clustering has been used in various ways including commerce, market analysis, biology, Web classification and so on. Clustering algorithms can be divided into graph-based, hierarchical, density-based, grid-based ,model-based and partitioning based algorithms.Fuzzy c -Mean (FCM) clustering algorithm is one of the widely applied algorithms in unsupervised model recognition fields. As well known, the optimal solution of FCM algorithm is obtained by minimizing the objective function. FCM clustering starts with selecting C initial clustering centers randomly(C is the number of clusters) and continue the algorithm by looping. FCM clustering is not perfect, either. Before using it, people need to know the number of clusters and good selection of initial cluster centers. If bad initial centers are picked, the objective function of FCM algorithm will not go to a minimum value.In this paper, several frequently used clustering algorithm are firstly discussed with one example. Then as the emphasis, improvement methods are introduced. In details, it concludes how to decide the number of clusters; how to get good initial clustering centers; To replace initial centers with cores of the clusters; To improve the“definition”of distance and to modify the membership value-m. Later on it is proved the improvement effect by using IRIS dataset, which is often used in clustering analysis. At last application of FCM in sea fog recognition is simply presented.

节点文献中: