节点文献

数据挖掘中模糊聚类算法研究

Research on Fuzzy Clustering Analysis in Data Mining

【作者】 黄金花

【导师】 马光胜; 张荣海;

【作者基本信息】 哈尔滨工程大学 , 软件工程, 2008, 硕士

【摘要】 数据挖掘不同于传统的数据处理技术,它能够从大量的信息和数据中分析和提取出有用的知识,来帮助人们做出决策。数据挖掘是目前信息领域和数据库技术的前沿研究课题,被公认为是最具发展前景的关键技术之一。作为数据挖掘主要方法之一的聚类分析技术,也随着数据挖掘技术的研究和发展,越来越受到人们的关注。聚类分析是将数据合理归类的一种方法,目前,已提出的聚类分析算法很多,本文对其中最常用的基于目标函数的模糊C-均值聚类分析算法进行了深入研究,针对其算法存在的不足,进行了一些新的改进。首先,针对模糊C-均值聚类分析算法中将数据集隶属度概率和为1的条件用于模糊性事件时,影响聚类正确率的情况,采用可能性理论作为理论基础,提出了一种新的基于隶属关系不确定的可能性改进模糊聚类算法,该算法将可能隶属度与不确定性隶属度引入到目标函数中,使得样本中的元素不局限于仅属于一个聚类,更符合现实情况。其次,针对模糊C-均值聚类分析算法中采用欧式距离进行相异性度量,只能对椭球状分布的数据进行聚类的局限性,采用马式距离进行相异性度量,同时采用输入数据矩阵化,从而能处理更多的数据模式,扩大了聚类的适用范围。为验证本论文提出的方法的有效性,对其进行了实验。从实验结果来看,达到了预期的效果。

【Abstract】 Data Mining is different from traditional data processing techniques, because it can analyse and pick up useful knowledge from a mass of information, which can help man make correct decision. Data Mining is a superior area in the information and database technology, and is usually considered as one of the key technology with wild developing perspective.Being the most important techniques of Data Mining, clustering analysis is more and more attentioned. Clustering analysis is a frequently-used technique which can classifying data in reason, at present, there are many clustering analysis techniques, this paper researches on Fuzzy Clustering-Means algorithm, and proposes an improved algorithm based on the disadvantage of FCM.Firstly, since the condition that the sum of possible membership degree of data set is 1 will make negative effect on the correction ratio of fuzzy clustering in fuzzy events, some research on the membership degree of data based on uncertain the theory. Possible membership degree and uncertain membership degree are introduced into this algorithm’s object function, which makes the element sample not longer belong to one cluster only and leads to more preferable results than current clustering algorithms. Secondly, using Mahalanobis space widens the application of the FCM. Lastly, changing the object vector to matrix adepts the algorithm to more data model.Through the experiments, the improved algorithm achieves expected purpose.

节点文献中: