节点文献

入侵检测中的机器学习方法及其应用研究

The Research on Machine Learing Method and Its Applications for Intrution Detection

【作者】 李战春

【导师】 李之棠;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2007, 博士

【摘要】 入侵检测是一种用于检测计算机网络中入侵行为的信息安全技术,是网络信息安全主动防护技术的基石。针对目前越来越频繁出现的分布式、多目标、多阶段的组合式网络攻击事件,以及下一代互联网可能会出现的未知安全问题,要求提高入侵检测系统的检出效率和智能化的呼声也越来越高。机器学习方法是用于分类和预测的一类方法。近来也在入侵检测领域得到不同程度的应用,但这些方法对于诸如样本相关性大、重复训练样本多、训练时间长以及入侵样本标记困难等问题并没有得到很好的解决。针对入侵检测特征数据中的重复或相似样本以及各特征参量之间可能存在的相关性,本文提出了一种集成主元分析和免疫聚类算法的特征数据压缩算法——PCA-IC。PCA-IC算法在不损失数据隐含的特征知识的前提下,进行数据压缩,以减少机器学习的样本数。PCA-IC算法先用基于主元分析方法,去除各特征参量之间的相关性,再用免疫聚类方法去除相似样本。在KDDCUP99入侵检测数据集上进行仿真实验,样本的压缩率达到89%。误用入侵检测是对已知网络系统和应用软件的弱点进行入侵建模,从而对观测到的用户行为和资源使用情况进行模式匹配而达到检测的目的,属于多模式分类识别问题。针对普通多类支持向量机需要使用所有的两类分类器进行计算,重复训练样本多、速度慢、实时性差的问题,提出了一种快速的、带入侵优先级的二叉树结构支持向量机误用检测分类算法——BTPM-SVM。BTPM-SVM方法引入优先级的概念,将多个支持向量机按优先级构成不对称分级二叉树结构,每一级的SVM训练样本数目,随级数的增加而迅速减少,极大地减少了重复训练样本,提高训练速度。在KDDCUP99的误用入侵检测数据集上进行仿真试验,样本的识别率为96%,在相同数据量下节约57%的计算时间。异常入侵检测是根据网络流量特征和主机审计记录等观测数据来区分系统的正常行为和异常行为。针对异常入侵检测中训练样本是未标定的不均衡数据集的情况,将其视为一个孤立点发现问题。提出了适用于孤立点检测的超球面One-class SVM的异常检测算法。在新墨西哥大学提供的“MIT lpr”系统调用数据集样本上进行仿真试验,在1001个异常样本中被正确识别1000个。用户异常检测是对系统中一些合法用户的行为进行监察,以防止这些合法用户进行非授权操作,或者防止其他用户冒用这些合法用户的账号进行非法或恶意操作。采用相关出现矩阵的二维建模方法来模拟用户行为,同时针对样本维数庞大的特点,采用主元分析法进行样本的降维处理,再对处理的样本采用多分类支持向量机方法进行识别。通过SEA数据集进行性能测试,样本的识别率为80.4%。为了实现IPv4网络向IPv6网络的顺利过渡,以保障下一代互联网安全有序的运转。基于上述算法,设计并实现一个基于机器学习技术的入侵检测原型系统——MLIDS。MLIDS原型系统在IPv4和IPv6环境下的仿真试验的检测率分别达到97%和98%,有较高的检测准确度,证明了所提出的BTPM-SVM和超球面One-class SVM算法的有效性和实用性。

【Abstract】 Intrusion Detection, essential for the initiative protection of network information security, is an information security technology used to detect any incursions into a computer network. In view of the unknown security issues which the next generation internet may encounter, as well as the increasingly frequent distributed, multi-objective, multi-stage network attacks confronting us nowadays, it is imperative that Intrusion Detection System enhance its detection efficiency and intelligence. The Machine Learning Method is used in classification and prediction,which have come into use in the field of intrusion detection. Nevertheless, many problems have not been satisfactorily resolved including the heavy correlation between sample data, big number of duplicated training samples, long term of training and the difficulty in identifying the intrusion samples.PCA-IC algorithm, the features compression algorithm integrated by Principal Component Analysis and Immune Clustering algorithm, is designed in view of the potential relevance between the duplicated or similar samples of the features in intrusion detection and the feature parameters. This algorithm compresses data without losing their implied feature knowledge so as to deduce the number of samples for machine learning. In this algorithm, principal component analysis is employed before hand to remove the relevance between various parameters, following by immune clustering algorithm to eliminate similar samples. In the simulation experiments conducted over the KDDCUP99 intrusion detection data sets, sample compression rate reached 89%.Misuse detection is a modeling for the weaknesses of the known network systems and application software, so as to pattern match the observed users’behavior and their use of resources, which falls into the group of multi-pattern classification. As for the problems that the general multi-class support vector machines, which have to use both the classifier for calculation, deal with too many duplicated samples at a low speed with unsatisfactory real-time-ness, the paper presents Binary Tree with Priority for Multi-class Support Vector Machine (BTPM-SVM) algorithm. BTPM-SVM introduces in the concept of priority, according to which multiple support vector machines are structured into an asymmetric graded Binary Tree, where the number of SVM training samples decreases rapidly with the ascending of grades, thus greatly reducing the number of duplicated samples and enhancing the training speed. In the simulation experiments conducted over the KDDCUP99 misuse detection data sets, sample detection rate reached 96%, saving 57% of calculating time with the same number of data.Anomaly Detection distinguishes between normal and abnormal behavior of a system according to the network traffic characteristics and the host audit data. As for the problem that training samples in anomaly detection are unlabelled and unbalanced data sets, attack detection is treated as outlier detection and one-class SVM of hypersphere can be utilized to solve it. In the simulating experiment conducted over the sample data sets called by the "MIT lpr" system, which is provided by University of New Mexico, 1000 of the 1001 abnormal samples were correctly identified.Masquerader Detection conducts surveillance over the behaviors of the legitimated users in the system, preventing them from any non-authorized operation, or preventing other users from fraudulent use of these legitimate users’account for illegal or malicious acts. In this paper, a co-occurrence matrix two-dimensional modeling method is employed to accurately simulate the users’behavior. At the same time, principal component analysis is conducted to reduce the dimensions of the samples, which have so many of them. After that, the multi-class Support Vector Machine is used to identify the samples under processing. According to the performance test by SEA data sets, the sample identification rate reached 80.4%.To achieve a smooth transition from IPv4 networks to IPv6 networks and to ensure the safe and orderly operation of the next generation internet system, this paper, based on the abovementioned algorithms, designs and realizes an Intrusion Detection Prototype System based on Machine-Learning technology--MLIDS. This MLIDS prototype system, in simulating tests in IPv4 and IPv6 environment, have the detection rate of 97% and 98% respectively. This relatively high detection accuracy proves the effectiveness and practicality of the BTPM-SVM and hypersphere One-class SVM, as proposed in this paper.

  • 【分类号】TP393.08;TP181
  • 【被引频次】8
  • 【下载频次】879
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络