节点文献

基于特征点选择的聚类算法研究与应用

Research and Application of Clustering Algorithm Based on Feature Point Selection

【作者】 朱国红

【导师】 石冰;

【作者基本信息】 山东大学 , 计算机软件与理论, 2010, 硕士

【摘要】 随着全球信息量的爆炸式的增长,数据挖掘技术已成为新世纪计算机科学技术的研究热点。聚类分析是数据挖掘的最主要的功能之一,聚类就是将数据对象分组为多个类或簇,在同一个簇中的对象之间具有较高的相似度,而不同簇中的对象差别较大。聚类分析主要解决的问题是如何在没有先验知识的前提下,实现满足这种要求的聚簇的集合。到目前为止,人们提出了各种各样数据挖掘的聚类算法,但这些算法仅适用于特定的应用以及用户,而且它们在理论和方法上还有待完善,甚至还有严重的不足之处。K-means聚类算法在数据挖掘领域具有非常重要的应用价值。但随着应用领域的拓展和新的问题需求,K-means本身存在的局限越来越突出。在应用中聚类个数通常根据用户视觉和使用方便性假定,但用户往往不能准确的确定聚类个数,聚类个数一旦确定在整个聚类过程中都不能更改,最终得到的簇的数目就是初始的聚类个数。并且初始聚类中心的选取不同也同样会影响聚类算法的效果,因此用户一般不会得到准确的聚类。K-means算法这两个重要缺点严重影响了它在聚类算法中的应用范围。本文在分析了当前各种聚类算法的思想和方法的同时,针对K-means算法存在的一些缺陷和不足,提出了基于特征点选择的聚类算法CFPS (Clustering algorithm based on Feature Point Selection)。CFPS算法同样也属于划分聚类算法,CFPS算法在聚类过程中引入了适应度函数,算法根据对象间的距离和适应函数的值进行聚类和调整聚类个数k,CFPS算法不用选取初始聚类中心,算法开始时每个聚类对象自成一类,因此聚类结果稳定,算法不会陷入局部最优的聚类结果。实验结果表明CFPS聚类算法在数据挖掘中与其它聚类算法相比,CFPS算法提高了聚类精度和效率。因此用户可以方便地使用本文提出CFPS算法,不需要配置复杂的参数,并且能得到更好或一样的结果聚类分析及相关技术在入侵检测中的应用是当前入侵检测研究的一个热点,本文尝试将CFPS聚类算法应用于入侵检测系统中,并使用KDD CUP1999数据集作为实验数据,对K-means算法与CFPS算法进行了仿真实验,算法分析与实验结果表明CFPS算法具有较好的检测性能,可以获得较高的检测率和较低的误报率,该方法克服了传统K-means算法需要人为确定k值和受初始聚类中心点选择影响的问题。

【Abstract】 With an explosive increase in global information, data mining technique has been a focus of the new century computer science and technology research. Cluster analysis in one of the most important fonctions in data mining.Clustering is the process of grouping a set of physical objects or objects into classes or clusters, in which similar objects are grouped in the same cluster while different objects are in different clusters. Clustering processes are always carried out in the condition without pre-known knowledge, so the main task is to solve that how to get the clustering result in this premise. Up to present, many clustering algorithms have been presented, but these algorithms are only suited special problems and users. Furthermore, they are imperfect both theoretically and methodologically, even severe fault. The K-means algorithm has the extremely important application value in Dam Mining, but with the application development and the new question demand, K-means limitations become increasingly prominent. The number of clusters in applications are usually based on the user assumes. But users often do not set the exact number of clusters. The number of clusters once have be established, in the whole clustering process can not be changed, the final clusters number is the initial number of clusters. And select different initial core nodes of the data also will affect the effectiveness of clustering algorithm, so the user generally will not get an accurate clustering. These two important shortcomings serious impact K-means algorithm’s application scope in clustering algorithms.This dissertation systematically, deeply, roundly and detailedly studies and analyses the technique and methods of clustering analysis, puts forward an improved Clustering algorithm based on Feature Point Selection(CFPS), considering the fault of K-means clustering algorithm. The CFPS algorithm also belongs to the database segmentation category.CFPS algorithm use a fitness function during clustering, CFPS algorithm according to the distance of clusters and the fitness function of the points to clustering and adjust parameter k of clusters, this algorithm don’t need select the initial core nodes of the data, at the beginning each object belongs to a cluster, so the result of clustering is stable, CFPS algorithm does not fall into local optimum clustering result. Experimental results show that the CFPS clustering algorithm in data mining, compared with other clustering algorithms, CFPS algorithm improves the clustering accuracy and efficiency. So users can easily use the algorithm proposed in this paper without configure complex parameters, and can get better or the same as the results of other clusterig algorithm.Cluster analysis and related technologies in Intrusion Detection Intrusion Detection is currently a hot topic, this dissertation attempts to use CFPS clustering algorithm in intrusion detection systems, and use the KDD CUP 1999 data set as the experimental data, the K-means algorithm and CFPS algorithm have be tested, algorithm analysis and experimental results show that the CFPS algorithm has better detection performance, get a higher detection rate and low false alarm rate, the method can overcome the traditional K-means algorithm needs to man-made determine the k value and by the initial clustering center of choice implications.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2010年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络