

A Novel Network Traffic Anomaly Detection Model Based on Superstatistics Theory

【作者】 杨越

【导师】 胡汉平;

【作者基本信息】 华中科技大学 , 信息安全, 2010, 博士

【摘要】 随着科学技术的不断进步,网络技术和网络的规模也随之不断发展和变化,一方面它使得网络的结构变得纷繁复杂,网络的设备和网络上承载的业务种类不断增加,另一方面随着网络的快速发展,网络出现各种故障或性能问题的可能性大大增加,各种威胁网络安全和影响网络正常工作的行为不断产生。通过检测网络流量的异常能够有效的发现网络中可能存在的故障和性能问题,它对于提高网络的可用性和可靠性的有着显著的效果,也能增强对网络安全应急的响应,从而保证网络的服务质量。基于网络流量的异常检测方法可分为有统计模型参数检测和无统计模型的参数检验。针对有统计模型参数检测这一类方法,在建立统计模型或分布时应首先要考虑序列的平稳性。时间序列的平稳性可分为严平稳和宽平稳,只有时间序列建立在平稳至少宽平稳的基础上才能进行统计模型的研究。大多数时候指的是宽平稳的情况,此时参量保持不变或者是变化很小。而对非平稳的时间序列进行平稳化处理,大多数的非平稳序列可以通过一阶差分或多阶差分处理成平稳序列;而网络流量特别是含有异常的网络流量是一种复杂的非平稳的变化过程,这也体现了网络异常流量突发性的特点,所以传统差分转平稳方法一般是无效的。由于以往的研究往往没有考虑到网络的非平稳性和突发性,及其统计分布参数是随机或者是复杂变化的,由此不可避免的导致异常检测所存在的问题。从网络流量的非平稳和突发性特点出发,特别考虑到由于攻击流量所引起的流量特性的变化,结合超统计理论,主要研究统计参量的变化。依据超统计的理论,首先应建立分布统计模型,由于网络流量的非平稳特性在建立统计分布模型的同时需对网络流量进行平稳化处理,在此基础上结合网络流量本身具有重尾性长相关的特点,建立广义pareto分布模型。根据微积分分窗处理的思想可以有效降低其在子窗显著性和复杂性,采用非平稳序列划分成平稳子段,再根据流量的本身统计特性确定相应的统计模型的基础之上,由保持统计模型的参数在子窗内的宽平稳性原则进行分窗处理。超统计理论本质是研究统计之统计,考虑分布统计模型的参量变化,络流量的变化。通过对系统结构起决定性作用的形状参数序列研究预测网络流量,在此基础上运用检测方法对网络流量进行异常检测。最后根据形状参数的预测结果进行异常检测。该检测方法大大降低了计算的复杂度,通过大量实验表明该方法具有良好的效果。从网络流量的复杂性和网络流量具有的混沌特性出发,将一维空间信息重构到多维空间,虽然增加系统的维数,但重构后的序列一方面能让信息分布在不同的维度,能够更加充分的展现系统的细节信息,能够在不同维度上全面的分析系统,另外一方面重构使每一维的计算复杂度大大降低,从而达到降低整个系统复杂度的目的。经过相空间重构后,网络流量仍然是一个非平稳的复杂过程,考虑到网络的非平稳性和突发性,及其统计分布参数是随机或者是复杂变化的,因此采用超统计理论来研究网络流量这类非平稳的复杂过程。另外针对网络流量的异常是从平衡态到另一平衡态过程的暂态过程,该过程是复杂的非平稳的突变过程,提出了一种基于突变理论的综合决策模型,既能有效的处理平衡态之间的暂态过程,又改变依靠单一参数序列的局限性,综合考虑多参数序列融合,提高的系统的准确性和实时性。此外与其他网络流量异常检测模型相比,该方法具有计算复杂度低,检出率高和误检率低等特点。本研究得到项目编号为No. 60773192的国家自然基金研究计划支持,项目名称为基于相空间突变模型的网络异常流量检测方法研究。

【Abstract】 With the fast increase of network connections, the problem of intrusion detection becomes more and more important. Although internet service can provide useful information due to its open property, it should also be noticed that the number of network intrusions increases faster than before, which introduces a lot of inconvenience to the users.Network traffic anomaly detection is usually divided into two basic categories. The one is based on statistical model (which first predict and then detect on the statistical model or the distribution), the other is based on the characteristic quantity of the network directly. Because of the large randomness and the enormous data quantity of the network traffic, it usually has a higher false reject rate and needs even longer computing time in detecting network traffic directly. The main advantages of network traffic anomaly detection based on the characteristic quantity are as follows: the number of the characteristic quantity is far lower than the original network flow, so it only spends less time to complete the detection. In the fact, the effect of the detection directly depends on the selected characteristic quantity and the result on anomaly detection if selecting the no- ideal characteristic quantity is even worse than that on detecting the original network flow directly. However, network traffic anomaly detection based on statistical model establishes the statistical model first with comprehensive consideration of all of properties of network traffic, and then predicts network flow according to the model, finally detects on the basis of the difference between the prediction results and the actual results. The advantage of this kind method is with the consideration of the network characteristics, but it needs a large amount of data and has very high computational complexity in posterior prediction. This paper mainly focuses on the method with the statistical model, furthermore, before establishing the statistical model or the distribution, and the stationary of the series should be considered first. We must take a wide sense stationary process as a basis to carry on feasibility research at least.In this paper, the experimental results show that the abnormal traffic flow is a kind of complicated changing process, which is non-stationary, random and abrupt. So it is invalid to process the non-stationary by using the differential transform simply, which has been proved in this paper. Due to the basic characters mentioned above, there are a lot of unavoidable problems in network traffic anomaly detection. Therefore, aiming at these kinds of complicated problems effectively, the superstatistics theory has been put forward to relate with the network flow, which is suitable for the change of the statistical parameters. We propose to use a more complex method which comprise the conception of‘statistics of statistics’(that is‘superstatistics’, SS) to model the network traffic. The‘superstatistics’is the frontier areas of today’s physics region which can conquer the disadvantage of normal statistical methods.‘Superstatistics’means a kind of‘statistics of statistics’, which is used in non-equilibrium systems with complex dynamics in stationary states with large fluctuation of intensive quantities on long time scales.After the non-stationary series transformed into the stationary series, the corresponding statistical model can be determined on the basics of the statistical characteristics. According to the infinitesimal calculus theory, segments can be done under the premise of a wide sense stationary, which can effectively reduce the significance and complexity of the segments. The superstatistics theory mainly describes the parameter variation of the distribution model and according to the parameter characteristics the abnormal changes of the network flow can be found by the changes of the parameters on a certain degree. Therefore, network traffic abnormity detection can be completed effectively through the research on the decisive distribution parameters which are named slow parameters and the adaptive detection method.As a whole, it is more visual then ever before. This method has obtained a very good effect through a lot of experiments. The research of the complex network flow focused on the region of some decisive parameter series. The method not only makes up for those shortcomings, but also avoids the computational complexity of the traditional statistical model.


