节点文献

一类模糊聚类算法研究及其应用

Research on Fuzzy Clustering Algorithm and Its Application

【作者】 曲福恒

【导师】 马驷良;

【作者基本信息】 吉林大学 , 计算数学, 2009, 博士

【摘要】 聚类分析作为一个重要的工具已经广泛应用于多个领域(?)模糊聚类算法由于具有良好的聚类性能与数据表达能力,已经成为近年来研究的热点.本文对当前主要的模糊聚类算法进行了研究,针对这些算法中存在的不完善之处提出了相应的改进算法,并对一类基于核的模糊聚类算法的收敛性给出了理论上的证明.本文所作的工作归纳起来主要有以下几点:一、深入研究了FCM类算法,提出了一种新的基于核的模糊聚类模型(IKFCM聚类模型),并得到三种不同形式的IKFCM聚类算法-IKFCM1、IKFCM2和IKDFCM算法.IKFCM1、IKFCM2算法通过核函数将数据映射到高维的特征空间,提高了算法发现非线性可分形状聚类结构的能力.IKDFCM算法利用核化距离作为聚类的相异性测度,对噪声与野值点有着更好的鲁棒性,计算的时间空间复杂度相对较低.二、证明了IKFCM算法与基于核的FCM算法(KFCM)的收敛性,这是对原有非核聚类算法收敛性定理的一种推广.收敛性定理表明特征空间内的此类模糊聚类算法的收敛性与数据核矩阵的秩之间有着密切的关系,核化距离形式算法的收敛性与核函数的凸性之间存在着密切的关系.三、提出了带有凸包约束的(核)可能性聚类模型,并引入全局优化技术对提出的模型进行求解,较好的解决了原始算法容易陷入局部极值与鞍点的问题.提出的算法对解的可行域进行限制,克服了原始算法中易产生重合聚类的不足,并且比普通的基于优化技术的(核)可能性聚类具有更高的效率.四、提出了利用迭代不动点的吸引域进行聚类的想法,并引入了一种新的聚类有效性指标,得到了一种新的均值漂移聚类算法及其快速算法,算法避免了FCM类算法中人为对初始中心作出假设的不足,并实现了对大数据集的聚类.

【Abstract】 With the development of the internet and information technology, human access more and more data every day. An important tool to deal with these data is cluster analysis. Compared with traditional methods of hard clustering, fuzzy clustering has a better ability to describe the data structure, which makes it become the mainstream of the clustering algorithms. In 1969, Ruspini first introduced the concept of fuzzy set theory into the cluster analysis fields. Since then many clustering methods based on fuzzy set theory have been proposed. The objective function based fuzzy clustering algorithms are the most popular used method in fuzzy clustering. These algorithms obtain the fuzzy partitions and clustering results of the data sets by solving a nonlinear programming problem. FCM-type algorithms (here mainly refer to the fuzzy objective function based partition clustering algorithms which are derived from FCM) are one of the most important classes of fuzzy clustering algorithms. A great deal of research has been conducted on FCM-type algorithms. In 2008, Miyamoto et al. mainly discussed the latest developments and practical applications of FCM-type algorithms in their works.This dissertation mainly focus on the studying of the FCM-type algorithms, and some algorithms are proposed to overcome the defect of the original algorithms. We prove the convergence theorem of a class of kernel based fuzzy clustering algorithms.FCM model is a probabilistic type constraint clustering model. It has been successfully applied into many areas. In the literature, there are a large number of researches have been conducted on FCM, they mainly focuse on the improvements of the model, parameter selection, study of the validity of clustering and how to obtain better solutions of the optimization problems. In these algorithms, the most two important algorithms are the possibilistic c-means clustering algorithm (PCM), which was proposed by Krishnapuram and Keller, and the possibilistic fuzzy c-means clustering algorithm (PCM), which was proposed by Pal and Bezdek.In FCM, the membership for a datum crossing all classes sums to 1, which makes its membership disable to reflect the real distance from the datum to the cluster center, and this is also the reason why FCM is sensitive to noise and outliers. To address this problem, PCM relaxes the constraint of FCM, and a penalty item is added on the objective function to avoid the meaningless optimal solutions. Although PCM solves the noise sensitivity problem of FCM, it also generates some new problems. Solving the optimization problem of PCM is equivalent to solve a number of independent sub-optimization problems, and each sub-optimization problem is dependent on only one center, which makes it tend to generate coincident clusters. From the experiments we found that, the global optimal solution is obtained only when all the clusters are coincident, that is to say we must try to obtain suboptimal solutions to get meaningful partitions. It obviously violates the target of designing the model. To solve the problems in FCM and PCM, Pal and Bezdek put forward the possibilistic fuzzy c-means clustering model. PFCM have the advantages of both FCM and PCM, in the mean while, it avoids the defects of both algorithms, i.e., PFCM does not generate coincident clusters and it also has a good noise robustness. There are c + 5 parameters which are set by the user in PFCM, however, how to set these parameters still has no theoretical basis, which makes PFCM have a strong dependence on the parameter settings. FCM, PCM and PFCM algorithms tend to find compact spherical clusters, they lacks the ability of indicating non-convex cluster structures of the data set. To address this issue, kernel technique is introduced into such algorithms, and kernel-based FCM, PCM and PFCM algorithm have been proposed. Such algorithms have the ability of finding the clusters with non-convex structures, but our experiments show that these algorithms also inherit the defect of the original algorithms, such as sensitivity to noise (in FCM), tending to generate coincident clusters (in PCM) and having more parameters (in PFCM).Considering the above problems, we put forward a kernel based improved fuzzy c-means clustering model, and three different forms algorithms (IKFCM1, IKFCM2 and IKDFCM algorithm) are obtained from this model. IKFCM1 and IKFCM2 have the ability of finding non-convex clusters, and IKDFCM has a better noise robustness and lower time complexity. The contrast experiments with the above algorithms show the clustering performance of the proposed algorithm.The above discussed algorithms still exist or partly exist the following problems: (1) The algorithm requires the assumption on the number of clusters and can not execute unsupervised clustering. (2) Solving the optimization problem of the clustering model is equivalent to solving a number of independent sub-optimization problems, and each sub-optimization problem is dependent on only one center, which makes it tend to generate coincident clusters. (3) These algorithms eventually boil down to the optimization problem of minimizing the objective function, and the traditional iterative method can not guarantee to converge to the global optimal solution, although it can be solved through the introduction of global optimization methods, it requires high time complexity to get precise global optimum. To solve the above problems, we put forward GKPCM (Generalized Kernel Possibilistic C-Means clustering) clustering model, which is a generalization of the conventional possibilistic and kernel based possibilistic clustering model. By limiting the feasible region of the solutions and using the global optimization techniques (here uses the simulated annealing as an example) to solve the optimization problem, GKPCM inherits the advantages of robustness to noise and avoids the problem of generating coincident clusters, and can well find the global optimal solution, and speed up the convergence speed by decreasing the search range of the global optimization techniques.Kernel-based fuzzy clustering is a new branch of fuzzy clustering algorithms, both of two important recent books of fuzzy clustering algorithms have sole part to introduce the kernel based fuzzy clustering algorithms. Although kernel clustering algorithm has many good properties, the convergence of such algorithms is still not studied. It is well-known that convergence is the basis for the iterative algorithms. In this dissertation, we establish the convergence of different algorithms that are derived from two kernel based fuzzy clustering model (including KFCM and IKFCM model). Take the KFCM for example, we give the following convergence theorem of KFCM2 and KDFCM: Theorem 0.1 (convergence of KFCM2) Let X contain at least n≥c≥2 points, fuzzy factor m > 1 , the rank of kernel matrix K is denoted by R(K) which satisfies R(K)≥c, let U0 be the starting point of iteration with U0∈Mfc1 , let {U(k)}k=1,2,…denote the sequence generated by the KFCM2 algorithm, then {U(k)}k=1,2,… either terminates at a point in (?) or there is a subsequence converging to a point in (?), where (?); (1)(?). (2)Theorem 0.2 (convergence of KDFCM ) Let X contain at least n > c > 2 different points , fuzzy factor m > 1 , K is the kernel function,θ∈Rs, f(z;θ) = K(z, z) -2K(θ, z) is a convex function with respect to z , let V(0) be the starting point of iteration with V(0)∈Rs , let {U(k), V(k)}k=1,2,… denote the sequence generated by the KDFCM algorithm, then {U(k), V(k)}k=1,2,… either terminates at a point in (?) or there is a subsequence converging to a point in (?), where (?)= {(U*, V*)∈Mfc1×Rsc |(?); (3)(?). (4)The other algorithms of different forms of IKFCM and KFCM model also have the similar convergence theorems. The convergence theorem of FCM and IFCM can be obtainned from the proved theorems of KFCM and IKFCM if we set the kernel function for some special forms. Hence the convergence theorem of KFCM is a generalization of the original FCM algorithm, and the convergence theorem of IKFCM is the first proof for the IFCM algorithm. The theorems show that the convergence of IKFCM and KFCM has a close relations with the ranks of the kernel matrix and the convexity of the kernel functions, which also provide us a theoretical basis of selecting reasonable kernel functions when executing such algorithms.An important issue of the objective function based fuzzy clustering algorithms is that such algorithm is sensitive to the initializations. Inspired by the convergence theorem 6.2.1, we propose a new thought of partitioning the data by the domain of attraction of the fixed point. Based on this idea, a new mean shift clustering algorithm called unsupervised multi-scale fuzzy clustering (UMF) is put forward, and a cluster validity is proposed to determine the final partition of UMF. Compared to the traditional fuzzy clustering, UMF has the following characteristics: UMF is not influenced by the initializations of the cluster centers; it can indicate the data structure from different "partition scales"; it can execute unsupervised clustering for the given data set. A fast vision of UMF is also presented in order to meet the need of dealing with the large data sets. Finally, we apply UMF to improve the EM based mixture-resolving methods and the fuzzy clustering algorithms, the experimental results show the performance of UMF.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2009年 08期
  • 【分类号】TP391.41
  • 【被引频次】30
  • 【下载频次】2248
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络