

Research on Algorithm and Application of Dynamic Grid-based Clustering over Data Stream

【作者】 何勇

【导师】 刘青宝; 陆昌辉;

【作者基本信息】 国防科学技术大学 , 项目管理, 2008, 硕士

【摘要】 近来,随着计算机通信技术和传感器网络技术的飞速发展,人们遇到大量无法用数据库进行存储的数据,这些数据是高速的、连续的、动态的、变化的、无限的,对它们的访问只能是顺序的、一次或有限次的,对它们的存储也只能是动态的、概要的。为了有效处理这些数据,人们提出了数据流模型。数据流模型已经引起了人们极大的关注。作为传统聚类方法在数据流环境下的延伸,数据流聚类方法已成为数据流模型研究的重点方向之一。本文通过对传统数据聚类方法和已有的数据流聚类方法进行研究分析,发现基于密度的方法可以发现任意形状的类,但其运算时间复杂度比较大并且不适合于发现分布情况不同的类。而基于网格的方法具有较高的运算速度,但牺牲了聚类的质量。因此,在研究和改进上述两类算法的基础上,结合数据流的特点,本文提出了一个基于动态网格的数据流聚类算法,主要对网格密度计算方式、参数的自适应设置和聚类顺序等进行了改进和创新,并通过实验数据和真实数据验证了算法的正确性和有效性。将数据流聚类分析方法用于网络异常检测是一个有趣的尝试。网络在给人们带来巨大方便的同时,也会给人们带来巨大的不便,层出不穷的入侵技术和手段经常会给网络带来毁灭性的灾难,而传统的异常入侵检测技术在扩展性和适应性上已不能应付越来越复杂的攻击方式,因此许多其他领域的知识被引入。最近,聚类分析方法由于能够发现未知的模式、自动实时更新异常入侵检测规则库而得到了高度重视。本文在Snort系统基础上构建了一个基于聚类分析方法的异常检测系统,最后通过实验证明该系统用于网络异常检测是有效的。

【Abstract】 Recently, with the rapidly development of communication technology and sensor network, people faced large volume of data which is beyond the storage in database, these data is high-speed, continuous, dynamic, variable and infinity. We can visit them only once or limited times, and the storage is dynamic and synoptic. To dealing with it, the data stream model has been wildly concerned by researchers.As the extension of traditional clustering, data stream clustering has been one of the most important directions of data stream research. In this paper, Firstly, the traditional clustering methods have been analyzed, It finds that methods based density can discover clusters of arbitrary shape but the running time is horrible and it can’t find clusters in the data sets which have different distribute. While methods based grid have high running speed, but the quality of the results is not good. Secondly, on the basic of researching and improving the existing traditional clustering methods, we have presented a new dynamic grid-based clustering algorithm which major works are summarized as follows: improving the method of density calculating, setting parameter automatically and clustering orders. Lastly experiment results show the correctness and effectiveness of algorithm.It’s an interesting attempt to detect network anomaly with data stream clustering. The network brings us not only a large amount of convenience, but also inconvenience. The network is often destructive by various intrusion technique and measures. Nevertheless the traditional anomaly intrusion detection technique couldn’t deal with the more and more complex attacks in the field of expansibility and adaptability, we need use other scopes of technology. Recently, clustering analytical method has been paid more attention since it can discover some unknown patterns and update the rules of anomaly intrusion detection in real-time. This paper establishes an anomaly intrusion detection system which is based on clustering analytical method and is built on Snort system. Finally it proves this system is effective by the experiment results.


