节点文献
基于行为的P2P流量及异常流量检测技术研究
The Study of P2P and Anomaly Traffic Identification Technology Based on Behavior Analysis
【作者】 王蛟;
【导师】 杨义先;
【作者基本信息】 北京邮电大学 , 信号与信息处理, 2008, 博士
【摘要】 随着互联网技术与对等网络(Peer-to-Peer,P2P)应用的迅速发展,对网络的管理与控制日益重要,流量检测技术也因此成为了一门重要学科。目前的绝大部分流量检测方法的研究成果仍然停留在理论阶段,不能很好地应用于实际,而且许多相关技术存在着缺陷,有待优化和改进。本文在前人研究的基础之上,围绕P2P流量检测与异常流量检测等问题进行研究,对基于数据包内容的检测方法进行了优化,深入分析了基于节点的流量行为特征、主体网络流量行为特征以及异常流量中的拒绝服务攻击行为特征,并基于此构建流量检测方法。论文主要研究内容如下:(1)在基于数据包内容的检测方法方面,为弥补现有方法在检测速度和准确率等方面的不足,提出一种基于可信列表的优化检测方法。该方法利用数据包载荷部分的特征进行分析,通过将已识别信息加入到一组可信列表中,使用标识会话访问频率的活性参数对列表进行优化和控制,保证可信列表中访问频率较高的记录被优先检测,降低搜索列表所带来的系统消耗。同时采用一种基于TCP数据包sequence number的加速检测方法来提高检测效率。该部分的另一贡献是提出一种协议特征提取方法,并在特征提取方面做了大量工作,修改了前人在协议特征方面的错误,增加了目前较为流行的应用软件特征。通过实验证明,基于可信列表的检测方法弥补了现有基于数据包的检测算法在性能方面的不足,有效提高了检测准确率,降低了误识别率。(2)为弥补基于数据包内容部分的检测方法在“未知”应用检测和数据包加密检测等方面的弱点,提出一种基于节点行为的P2P流量检测方法,简称之为NBTI(Traffic Identification based on Node’s Behavior)方法。该方法通过对特定应用引发的节点连接情况、定长时间内的连接总量以及一些其他行为特征进行分析,根据分析结果建立检测模型。该方法不依靠数据包载荷特征,因此可以有效地检测加密后的网络流量或者具有某类行为特征的“未知”网络应用,同时也避开了网络数据的隐私权问题。基于UDP数据包的启发式检测方法进一步增强了NBTI方法的检测效果,结合其基于节点进行检测的特点,使NBTI方法在大流量网络环境下的检测效果更佳。(3)针对极少部分网络流占据了绝大部分流量比特数这一现象,提出基于行为的主体流量检测方法,简称之为BMTI(Behavior-based MaiorityTraffic Identification)方法。BMTI方法的分析对象主要是那些对网络带宽消耗较大的应用,如P2P流传输类应用、P2P文件共享类应用、拒绝服务攻击、蠕虫、扫描等行为。通过对不同应用的节点流量占有率、应用原理、行为特征等方面的异同点分析,获取可以用于检测的特征,利用这些特征的组合来检测特定应用。BMTI方法采取了一些特殊策略来提高检测效率。其一,定义数据包个数门限值和比特数门限值,对检测列表进行限定;其二,采用基于TCP数据包加速算法和UDP数据包启发式算法来提高检测效率和准确率。(4)针对异常流量中威胁较大的拒绝服务攻击,提出一种基于行为的拒绝服务攻击检测方法。通过对单位时间内拒绝服务攻击所产生的数据包连接情况、数据包长度分布特征、外部节点分布情况及端口分布情况等特征进行分析,给出发生拒绝服务攻击时的七点流量异常表现,并根据分析结果,提出了基于数据包长度范围、节点及端口的变化情况、上下行流量比例的变化情况、数据包间隙以及数据包内容相似度等五点要素的拒绝服务攻击检测方法。通过P2P、FTP以及拒绝服务攻击的混合实验,证明了算法的有效性。综上所述,本文提出的基于行为的流量检测方法具有模型简单、适用范围广、易于工程人员理解等特点,不仅在理论上值得深入研究,而且还具有较大的工程应用价值。
【Abstract】 With the rapid development of Internet technology and P2P applications, network management and control has become increasingly important, and traffic identification technology has also become an important subject. At present, most traffic identification methods still remain in the theoretical stage, so they cannot be applied to reality. On the basis of previous studies, this paper focuses on P2P traffic identification and anomaly detection, and optimizes the payload-based identification methods. Meanwhile, behavior characteristics of node-based traffic, the majority of the total traffic behavior and denial of service behavior on anomaly traffic are thoroughly analyzed.The main work and contributions of the thesis are as follows:(1) On the aspect of payload-based identification methods, to make up limitations in identification speed and accuracy rate of the existing methods, a heuristic traffic identification method based on trusted list was proposed. This method first analyzes packets payload characteristics, then adds the discerned connection into a trusted list, and finally uses an active parameter to optimize and control the trusted list. This parameter labels the frequency that a session has been visited, which ensures that a record with a high frequency in the trusted list can be identified by a high priority. This parameter can help to reduce the system costs caused by searching of the list. Furthermore, a method that uses sequence number of TCP packets to accelerate the identification speed was proposed. Also, a plenty of work on protocol characteristics has been conducted, according to which, mistakes on old characteristics were modified and new regular expressions of popular applications were introduced. The experiments results show that this method can make up the drawbacks of the original algorithm and effectively improve the identification accuracy.(2) To avoidthe powerlessness of payload-based methods on identifying "unknown" applications and encryption packets, a diffused traffic identification method based on node behavior analysis (NBTI for short) was proposed. NBTI method focuses on the node connection characteristics incurred by one P2P application, as well as total connection number in a specific period, and builds an identification model on the basis of the analysis results. For NBTI method does not rely on packet payload characteristics, it can effectively identify encrypted traffic and those "unknown" applications, which sometimes exhibit certain types of behavior characteristics. NBTI method can also avoid privacy issues of network traffic. By means of UDP packets based heuristics method; performance of NBTI method can be further enhanced. Combining with node-based characteristics, NBTI is suitable for network environment.with large traffic.(3) For the phenomenon that most part of the total bytes is occupied by a small part of flows, a behavior-based majority traffic identification method (BMTI for short) was proposed. BMTI method focuses on five types of heavy traffic applications such as P2P streaming, P2P file sharing, denial-of-service (DoS), worms and scan behavior etc. It mainly analyzes the similarities and differences of these applications in percentage of communication nodes, principles of applications and other behavior aspects, by which we can obtain different features to identify specific applications. This method adopts some special strategies to improve the efficiency of identification. One is to define the thresholds of packet number and bytes number to restrict the identification list, the other is to employ TCP packet based accelerate method and UDP packets based heuristics method to improve identification efficiency and accuracy.(4) As to the denial-of-service which is one of the major origin of anomaly traffic, a DoS attacks identification method based on behavior analysis was proposed. This method mainly analyzes the packets connections; packets length distribution, remote nodes and ports number distribution caused by DoS attacks, and concludes seven types of different features. Based on the analysis results, five factors were proposed to identify DoS attacks, which include packet length range, variation conditions of nodes and ports number, the ratio of upload-to-download bytes, the interval of packets and the similarity of packets. By the results of P2P, FTP, as well as denial of service attacks hybrid experiment, the effectiveness of this algorithm is proved.Taken as a collection, the proposed traffic identification method based on behavior analysis has model simplicity, and it is easier to be understood by engineers. Not only does it deserve deep research in theory, but also does it have better application values for engineering.
【Key words】 P2P; traffic identification; behavior analysis; majority network traffic; network anomaly; denial of service;