节点文献

基于数据挖掘的自适应异常检测研究

Study on Adaptive Anomaly Detection Based on Data Mining

【作者】 任斐

【导师】 胡亮;

【作者基本信息】 吉林大学 , 计算机系统结构, 2009, 博士

【摘要】 随着网络应用领域日益广泛,网络攻击手段不断向多元化、复杂化、智能化方向发展,使得单纯依赖防火墙等静态防御技术已经难以满足保障网络安全的需要。入侵检测作为一种主动的信息安全保障技术,能够弥补传统安全防护技术的缺陷,应对网络流量增长、攻击模式进化带来的安全挑战。各类入侵检测技术中对变化网络环境具有自适应性的入侵检测系统成为当前入侵检测研究的热点,本文针对这一热点问题进行了以下研究:1.分析和总结了现有入侵检测系统技术特点,特别是结合数据万挖掘技术的异常检测系统的研究现状及存在的问题。2.提出了一种新的基于数据挖掘技术的自适应入侵检测模型AADDM。在AADDM模型中包含训练数据集合生成、有用特征选取、自适应异常检测、检测模型升级和攻击特征本体构建5个组成部分。3.提出了一种基于自顶向下密度聚类的训练数据集生成算法。首先,给出了一种基于域密度属性的数据相似性描述方法。根据域间密度差异划分概念,定义了多维索引结构中形式化表示的簇。其次,在簇生成过程中,采用基于分支定界理念的密度剪枝(DP)算法剪除无搜索价值区域,提高聚类效率。最后,通过在仿真数据集合上与BIRCH算法比较实验和形式化证明验证算法的高效性和正确性。4.提出了一种结合遗传算法和SVM理论的特征选取算法。该算法利用遗传算法对样本空间空间进行随机搜索,并利用One-class SVM的分类正确率评估搜索结果,获取最优特征子集。论文在KDD1999数据集上进行算法测试,验证算法有效性。5.研究了一种基于增量密度聚类的自适应异常检测算法ADDBIC。算法在目标数据集每一特征列上根据特征值的密度信息聚类。一旦有新簇生成,其所有特征值属性被抽象为一个正常轮廓,每个轮廓由一个内部概要和一个外部概要构成。所有特征列上的正常轮廓集合构成ADDBIC算法的检测模型。实时网络连接记录产生后,将立即通过检测模块与检测模型比较,如果二者差异超过预设的“红色”异常级别,则指示有入侵发生,算法将立即向管理员报警;如果比较结果为正常,这条记录的每项有用数据都将通过插入操作被插入到已存在簇中,相应的检测轮廓也随之更新。6.提出了一种基于增量式密度聚类分析的入侵检测本体描述方法,细化聚类算法结果并获取关于入侵攻击的详细属性描述、抽取领域概念、描述概念间的关系,以本体语言OWL描述入侵检测领域知识。简要介绍了Mitnick和Buffer Overflow攻击的本体实例化描述过程。在入侵检测过程中,利用本体描述所有的攻击实例信息,发现相关实例攻击间的联系,描述整个攻击过程。

【Abstract】 Data mining can mine specified patterns that people are interested in from large datasets. Therefore, data mining technique is applied for intrusion detection in large number of research projects, which greatly promote the development of intrusion detection. However, there are still many problems in the field of data mining-based intrusion detection, as following: poor adaptability,inability to detect novel attacks; high ID (Intrusion Detection) modeling cost,slow updating speed; Lacking of extensibility, lack of the ability to adapt the ID model derived from certain computer system to another system.In order to promote the development of data mining and intrusion detection techniques, aiming at the essence of problems in data mining-based intrusion detection, this paper provides new methods and effective approaches for intrusion detection in theory and in application as following aspects:1. The classification of IDS is dissertated. Meanwhile,the system structure of IDS and related detection technologies are discussed in detail. Also, a survey about ID modeling technology is given and the primary problems of ID modeling are discussed.2. A novel ID model-AADDM is put forward. The design process, model structure and means of collecting and retreating network connection records are also given. AADDM filters the noise /attack data in source dataset and generates a pure training dataset by a top-down density-based clustering method; builds a lightweight and efficient intrusion detection system by GA-SVM based useful feature selection algorithm; makes use of unsupervised self-learning mechanism-incremental density-based clustering,partitions network behavior set into normal behavior set, abnormal behavior set and generate intrusion detection profiles. The intrusion patterns are extracted automatically from real time security affairs data,so the intrusion patterns database can be updated automatically according to the current condition. Besides,training datasets and background knowledge are not needed,so AADDM has the advantage of less cost. AADDM provides a novel idea for ID research.3. In this paper, we have proposed a training data set generation algorithm which uses a novel top-down clustering method based on region density using a multidimensional index. Generally, multidimensional indexes have inherent clustering property of storing similar objects in the same or adjacent data pages. By taking advantage of this property, our method finds similar objects using only the region density information without incurring the high cost of accessing the objects themselves and calculating distances among them. First, we have provided a formal definition of the cluster based on the concept of region contrast partition. Next, we have proposed the density_ pruning_clustering algorithm(DP). DP employs a branch-and-bound mechanism that improves efficiency by pruning unnecessary search in finding the set of dense regions. To evaluate the performance of the proposed algorithm, we have conducted extensive experiments. Experimental results show that the accuracy of the proposed algorithm is similar or superior to that of BIRCH except for exactly spherical clusters. The results also show that the efficiency of the proposed algorithm is far superior to that of BIRCH due to density-based pruning. Experimental results for large data sets consisting of 10 million objects show that density_pruning_clustering algorithm reduces the elapsed time by up to 96 times compared with that of BIRCH. Even with the cost of index creation and maintenance considered, the proposed algorithm is significantly (by an order of magnitude) more efficient than BIRCH. Further, we note that the improvement in performance becomes more marked as the size of the database increases, making this method more suitable for larger databases. The top-down clustering approach proposed in this paper greatly improves the clustering performance for large databases without sacrificing accuracy. We believe that the proposed methods will be practically usable in application as intrusion detection training dataset generation.4. Feature selection is one of the main methods for data preprocessing, which can be used for alleviating the effect of the curse of dimensionality, enhancing generalization capability and improving model interpretablity. This paper proposes a new feature selection algorithm, called GA-SVM, aiming at building intrusion detection system by (1) using a hybrid strategy of genetic algorithm and heuristical seareching algorithm as the search strategy to specify a feature subset for evaluation ; (2) using one class Support Vector Machines to evalueate the quality of the searching results. We seperated KDD1999 intrusion detection dataset into several testing groups. The experimental results show that the approach is able not only to speed up the process of detection but also make a better detection quality.5. In this paper we present an adaptive anomaly detection algorithm using density-based incremental clustering called ADDBIC. It applies a new statistical method to summarize the normality profiles of the clusters generated by the algorithm automatically. Each normality profile is corresponding to a cluster and composed of two different summaries: internal and external. The internal summary contains the properties of the cluster while the external summary represents the statistics of noise values around the cluster. All normality profiles are collected and used to monitor the target system as a detection model. Updating algorithms of insertion and deletion are explored to adjust existing clusters and normality profiles in a real-time manner. Due to the density-based nature, updating operations affects existing clusters only in a small range neighborhood of the inserted or deleted training instances. The major contributions of this paper lie in twofold. Firstly, initial clusters on training data set are generated by density-based clustering and adjusted in a small range in a real-time manner. By comparing feature values of training data set, we discover that normal values always concentrate on a small numerical range while abnormal values spread around the normal values. So we can distinguish normal and abnormal values by their density relationship. When updating detection model by insertion or deletion operations, feature values will be inserted in or deleted from existing clusters. It can be shown that our insertion or deletion operations will not greatly change the density relationship of normal values in existing clusters. So we can update the detection models just by some adjustments in a small range of existing clusters instead of retraining on the whole database. Time cost of updating is greatly saved and the updating can be done in real time. The second contribution of our paper is that we use the statistical method to describe the detection model generation and attack detection. Once a cluster generated or modified, normality profiles of feature values involved in clustering will be calculated and compared with the online connection records by our statistical method. For containing only statistical summaries of existing clustering results, our normality profiles could be updated and compared much efficiently. ADDBIC shows a better performance on real-time anomaly detection, when compared to other existing adaptive detection algorithms such as ADWICE. The comparison experiments have shown that ADDBIC demonstrates a better performance on the given data set than ADWICE in terms of both false alarm rate reduction and profile updating these important factors for anomaly detection systems.6. In the paper, a kind of ontology description method of IDS based on the incremental density based clustering is provided. The method captures the detailed description of attack attributes, extracts the concepts and relationships between the concepts, and depicts the knowledge of the IDS domain using the OWL. We introduce the instantiation description of Mitnick and Buffer Overlfow by ontology description method. During the intrusion detection, the instance information, the relationships between the attacks and the whole procedure of the attacks can be detailed described and generated by the method. Provided the sharing of domain knowledge, ontology-based intrusion detection system possesses the ability of reasoning upon the instance information of attacks and the description of attacks provides the same agreement on the knowledge in heterogeneous intrusion detection systems.In conclusion, this dissertation has academic significance and value of application, and it enriches the research of intrusion detection. It also provides constructive method and techniques for research of intrusion detection.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2009年 08期
  • 【分类号】TP393.08
  • 【被引频次】10
  • 【下载频次】1209
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络