节点文献

入侵检测中基于密度的数据流聚类算法研究

Research on Density-Based Clustering Algorithm of Data Streams in Intrusion Detection

【作者】 王彦涛

【导师】 张凤斌;

【作者基本信息】 哈尔滨理工大学 , 计算机软件与理论, 2011, 硕士

【摘要】 随着计算机的普及和网络技术的迅速发展,网络给人们带来利益的同时,也遭受着多种形式的攻击。入侵检测作为主动的安全防护技术,有效地阻止了各种攻击。目前数据流挖掘得到人们越来越多的重视,在数据流上建立模型,进行实时挖掘,这对于入侵检测来说很重要。数据流聚类算法是数据流挖掘的一个重点发展方向,利用数据流聚类算法建立的入侵检测模型,能够实时更新入侵检测规则库。所以将数据流聚类技术应用到入侵检测具有重大的现实意义。然而目前的数据流聚类算法存在着很多的缺点,本文以D-Stream算法为研究背景,分析了算法存在的缺点和不足,以提高入侵检测系统的检测率,降低误报率为目标,通过对算法进行改进使其更好地满足入侵检测的需要。首先,分析了当前入侵检测系统的发展现状和存在的问题、数据流挖掘的相关技术、数据流聚类算法的特点及入侵检测对数据流聚类算法的要求,为后文介绍提供了理论基础。然后,通过对D-Stream算法进行研究,给出了一种基于密度的数据流聚类算法M-Stream。针对Cosine相似度和Minkowski距离的特点,引进频度和摘要信息概念,提出了一种度量混合属性数据相似性的度量方法。针对算法的时空复杂度问题,算法采用树和哈希表来存储结点和指针。针对参数设置问题,提出了一种密度阈值函数,使数据流聚类在固定内存约束内进行。针对离线聚类问题,通过扩展邻居细胞概念来进行聚类,并通过内存抽样方法来发现演化的簇。最后,根据数据流的特点,设计了一个适合于数据流聚类的入侵检测模型,并采用后台学习的方式实时更新规则库。通过在KDD CUP1999数据集上的实验,表明了该算法优于以前的算法,达到了预期的效果。

【Abstract】 Along with the popularity of computer and rapid development of the network technology. Network brings interests to people, but also suffers from various forms of attack. Intrusion detection works as a proactive security technology, effectively prevents various attacks. Data stream mining gets more and more recognition from people. Building model on data stream, doing mining real time, which is very important for intrusion detection.Clustering algorithm of data streams is a key development direction of the data stream mining. Using clustering algorithm of data streams builds the intrusion detection model, which can update intrusion detection rule library real time. So put the clustering algorithm of data streams to intrusion detection has the significant practical significance. However the current clustering algorithm of data streams exists many shortcomings, based on the D-Stream algorithm as the research background, analyzing the algorithm’s shortcomings and the insufficiency. The goal is to make the intrusion detection system has high detection rate, low false alarm rate. Through improving the algorithm, makes its better meet the needs of intrusion detection.Firstly, this article analyses the development status of the current intrusion detection system and problems、related technologies of the data stream mining、the characteristics of the clustering algorithm of data streams and the requirement of the clustering algorithm of data streams for intrusion detection, which is providing theory basis for the after article.Secondly, through reseraching on the D-Stream algorithm, this paper presents a density-based clustering algorithm of data streams which is M-Stream. According to the Cosine similarity and the feature of the Minkowski distance and importing the concepts of the frequency and summary information, presents a similarity measurement method between the mixed attribute dataes. Aiming at the time and space complexity problems of the algorithm., which adopts trees and hash table for storing nodes and pointer. Aiming at the parameter setting problem, this paper proposes a density thresholding function, making the clustering of data streams execute in fixed memory within the constraints. Aiming at the off-line clustering problem, through extending neighbor cells concept to cluster, through memory sampling method to find the evolution of the cluster.Finally, according to the characteristics of the data stream, this paper designs a suitable intrusion detection model based on clustering algorithm of data streams, using the backend learning to update the rule library. Using the KDD CUP1999 datasets to test the system, the experimental results show that the method is better than previous algorithm and achieves the desired target.

节点文献中: