节点文献

基于免疫原理的实值编码入侵检测系统研究

Research on Immunity Based Real Value Coded Intrusion Detection System

【作者】 翁广安

【导师】 余胜生;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2008, 博士

【摘要】 目前在信息安全中检测未知入侵行为变得越来越重要,传统异常入侵检测模型存在正常特征简档更新、动态实时检测、分布式检测等困难。新兴的基于生物免疫系统原理的入侵检测为解决传统异常入侵检测面临的诸多难题提供了新的途径。但现有免疫入侵检测技术还处于初期阶段。详细研究二进制编码免疫入侵检测系统,提出了该类系统检测器集的改进算法,缩减了检测器集的冗余信息量。然而二进制编码及rcb匹配规则无法有效地处理长串模式,不能适应众多特征属性下的入侵检测,难以胜任动态变化环境下的实时检测。为此,提出基于实值编码的免疫入侵检测方法,做了如下研究和创新性工作。首次较系统地将实值编码移植到免疫入侵检测中,定义了self集合的表示、检测器的表示,建立超球体和超矩形两种模型,提出新的检测器生成方法--多峰值进化,训练可变覆盖范围的检测器,特定的适应度函数使检测器尽量填充self附近以及self实体之间的细小检测空洞,解决了在巨大的模式空间中随机生成法不能有效覆盖non-self区域的不足。分析实值编码的检测粒度特性,获得归一化时采用的最大值越小,信息损失越小,检测粒度越好的结论。构建了超球体和超矩形系统,实验结果显示,在DARPA99网络数据集、机器学习Wine数据集上,超球体系统在检测率、误报率、non-self区域覆盖的均匀程度、不完备训练集的适应力、算法稳定性、训练时间代价等方面均优于超矩形系统;多峰值进化生成法在检测率、non-self区域覆盖的均匀程度、算法稳定性等方面均优于随机生成法;随机生成法不适用于13维的Wine数据集。多峰值进化超球体系统在KDD Cup’99数据集上进行了实验,验证了该系统能较好地适应高维数检测,得出算法的时间代价和模式维数、训练集大小都近似成线性关系的结论。提出了利用数据的分布特性提高多峰值进化超球体系统检测精确度的方法和途径。首先建立高斯概率模型描述数据空间中模式的分布,定义聚簇等级参数表征数据点聚集成簇的程度;提供了根据特定聚簇等级生成合成数据集的方法;研究了实际数据集的聚簇特性。实验结果显示,聚簇特性越好(聚簇等级小)的数据集,检测能力越好;增加检测器数目或者降低容忍等级可以一定程度上补偿差的聚簇特性。在此基础上,提出了扩展的self空间超球体构造模型—可变半径self球体模型(VRSSM),根据聚簇情况模式空间区域实施不同的容忍等级,在检测器训练过程中各个self超球体将具有不同的半径,提高了self/non-self界线划分的精确度。分析表明总体检测能力受数据聚簇特性和正常-异常间的平均属性偏移影响,VRSSM模型的性能则受到聚簇形状和数据点密度差异等客观因素的影响。实验结果显示合成数据集和DARPA99数据集符合VRSSM模型的假设,该模型提高了检测率,降低了误报率。建立了多峰值进化超球体系统动态实时检测机制,包括强化的初始化训练机制,提高入侵行为高发区域覆盖率的克隆选择和基因库机制,在检测器集不断更新过程中仍然能识别遇到过的入侵行为的免疫记忆机制。提出了VRSSM的动态扩展模型(Dynamic VRSSM)。该模型用正向记忆标定正常行为密集区域,实时计算不同区域的容忍等级。分析指出,实际网络入侵检测系统激活阈值设定为1较合适。克隆个体超变异的概率过高过低都对检测不利,通过实验可找到较佳数值。网络数据集(DARPA99和KDD Cup’99)上的仿真测试结果,验证了动态实时检测机制的有效性;验证了Dynamic VRSSM模型的正确性。结合分布式耐受和集中式耐受机制,提出了一个分布式协作体系结构原型。集成了实值编码超球体空间表达、多峰值进化检测器生成、VRSSM模型、动态实时检测机制及Dynamic VRSSM模型实现了一个单节点的实验平台,验证了上述理论的正确性和模型的可行性。

【Abstract】 In information security areas, detecting unknown intrusion activities becomes more and more important at present, traditional anomaly detection systems face problems on following aspects: updating normal profiles; dynamic real-time detection; distributed detection. New intrusion detection approach based on Biological Immune System principle provides solutions to settle many difficulties that traditional anomaly intrusion detection encountered. But nowadays immune intrusion detection techniques are in their early stage.Binary coded immune intrusion detection system is investigated in detail and algorithm of improving its detector set is proposed, redundant information of detector set are reduced. However, binary code and rcb matching rule are difficult to deal with long strings effectively, so hard to adapt to applications involving many features and hard to adapt to real-time intrusion detection under dynamic changing circumstance. Therefore, immune intrusion detection approach based on real value code is proposed, and following researches and innovative works are carried out.Real value code is relative systemically transplanted to immune intrusion detection system for the first time. Self set representation, detector representation are defined; hyper-sphere and hyper-rectangle models are built to construct the pattern space. A new detector generating method -- multimodal evolution is demonstrated, it creates detectors with variable coverage, and a certain type of fitness function is used to guide the detectors evolving towards those small detection holes close to self set or among self entities, overcoming the disadvantage that random generating method cannot cover non-self areas efficiently in high dimensional space. Detection granularity characteristics of real value code are analyzed and get the conclusion that the less the value used to normalize an attribute, the less the information loss, so better detection granularity would be acquired.Hyper-sphere and hyper-rectangle system are constructed. Experiments on DARPA99 network data set and Wine data set indicate that, hyper-sphere system gets better performances than hyper-rectangle system on aspects including detection rate, false alarm rate, stability, time cost, adaptability to incomplete training set and uniformity of coverage on non-self space; multimodal evolution generating method performs better than random generating method on aspects of detection rate, stability, uniformity of coverage on non-self space; random generating method can’t be applied to Wine data set containing 13 features. Experiments on KDD Cup’99 data set show hyper-sphere system of multimodal evolution functions well in high dimensional pattern space, and its time cost is approximately linear with dimensions and training set size.The approaches of using distribution characteristics of data set to improve detection precision are developed for hyper-sphere system of multimodal evolution. Gauss distribution model is built to describe distribution of patterns in data space firstly, and a parameter of clustering level is specified to represent the degree that data clusters are close to Gauss cluster on shapes; Algorithm of generating synthetic data sets according to given clustering level is provided; Clustering characteristics of real data sets are analyzed. Experiments indicate that better detection ability are gained for data sets of better clustering characteristics (less clustering level); more detectors or lower tolerant level can to some extent compensate for bad clustering characteristics of data sets.Based on above works, an extended hyper-sphere model for self space construction– VRSSM (variable radius of self sphere model) is developed, it implements different tolerant level in different areas of pattern space according to the clustering characteristics there, so self hyper-spheres will be set to variable radius in detector generating procedure, locating the boundary between self and non-self more accurately. Analysis indicates that detection ability is affected by clustering characteristics of data set and average attribute deviation between self and non-self; VRSSM effects rely on clustering characteristics and data point density difference among different areas of space. Experiments show synthetic data sets and DARPA99 network data sets follow the hypothesis of VRSSM, higher detection rate and lower false alarm rate are produced.Following dynamic real-time detection mechanisms for hyper-sphere system of multimodal evolution are established: strengthened initial training; clonal selection and gene library, they ensure detectors cover by higher probability those areas containing more intrusion activities; memories, they ensure detectors set keeps its ability to recognize intrusions encountered before while updating continuously. Dynamic extension of VRSSM (Dynamic VRSSM) is proposed. Positive memory is used to denote dense areas of normal activities by Dynamic VRSSM in order to calculate tolerant level of different positions online. Analysis indicates that real network intrusion detection system has a suitable activation threshold of 1. Hyper-mutation probability of clonal selection should not be too high or too low, an appropriate value can be found by trial experiment. Emulation tests on network data set (DARPA99 and KDD Cup’99) show those dynamic real-time detection mechanisms are effective and Dynamic VRSSM is feasible.A distributed cooperative architecture prototype combining both distributed tolerance and central tolerance is presented. Integrating hyper-sphere representation based on real value code, detector generating based on multimodal evolution, VRSSM, dynamic real-time detection mechanism and Dynamic VRSSM, a single node experiments platform is fulfilled, giving a proof for validities of above theories and models.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络