节点文献

大规模NetFlow数据上的P2P流量检测

P2P Traffic Detection on Large Scale NetFlow Data

【作者】 张蕊

【导师】 周傲英;

【作者基本信息】 复旦大学 , 计算机软件与理论, 2008, 硕士

【摘要】 随着P2P得到越来越广泛的应用,P2P流量检测逐渐成为网络数据分析领域的一个热门问题。有关报告表明,P2P应用所产生的流量占据网络总流量的50%以上。由于P2P应用会导致网络拥塞,对于电信运营商来说,如何从全部网络流量中检测出P2P应用所产生的流量就成为一个非常重要的课题。P2P采用随机端口进行传输的机制,以及P2P系统本身的复杂性、分布性和多变性,都增加了P2P流量检测的难度。本文主要目的是研究如何有效地在大规模NetFlow数据上进行P2P流量检测。目前的P2P检测方法都是针对数据包的,对主干网上数量庞大的数据包进行分析要耗费大量存储和计算资源,因此学术界中大多数研究工作都无法得到实际应用。目前投入使用的P2P检测产品利用串接设备采集数据包内容,并依靠硬件来进行计算,具有部署代价昂贵、可扩展相差、侵犯隐私等弊端。本文用NetFlow数据进行P2P检测,克服了上述问题。NetFlow数据对数据包信息进行了汇总和统计,既保留了体现流量特征的重要信息,又降低了数据量,并且NetFlow技术作为业界标准已经在电信运营商中得到广泛使用。本文的主要贡献包括:1)根据P2P协议的运作原理,推测P2P在流量表现上可能具有的一系列特征。对每一条特征,都通过实验验证它在区分P2P流量和非P2P流量上的效果,根据实验结果选择有效的检测特征。2)设计了一个基于NetFlow数据的P2P流量检测算法。该算法将1)中选取的有效特征,按照检测逻辑组织起来,使检测更加高效。3)基于2)中的算法,实现了P2P流量检测系统INFOPAD。系统利用数据库实现数据的存储,用SQL查询的方式来实现检测算法,很好地解决了对大量流量数据进行存储和计算的问题。在系统中各个检测规则形成独立的模块,新规则可以作为新模块方便地整合到系统中来,系统架构具有良好的开放性和可扩展性。4)对INFOPAD系统采用上海电信路由器上采集的真实NetFlow数据进行实验,并根据上海电信提供的深度包检测(DPI)报告对检测结果进行验证。通过实验证明,INFOPAD系统的检测算法能够达到较高的准确率,并且系统的性能可以达到离线分析的要求。本文实现的检测系统适用于电信主干网络上的P2P流量检测。系统接收路由器输出的大量NetFlow数据并进行离线分析,提交出P2P流量报告。本系统已经在上海电信的日常网络管理中得到应用,和上海电信原来部署的深度包检测(DPI)产品相比,本系统可以达到同等程度的准确率,但是部署代价降低了很多,而且检测算法的维护和更新更加方便。

【Abstract】 With the increasing use of P2P applications,P2P traffic detection gradually becomes one of the hot topics in network traffic analysis field.The popular P2P applications make more than 50%of the network traffic according to some reports. Since P2P applications can cause network congestion,it becomes an important problem for operators that how can detect P2P traffic out of all the network traffic. P2P applications use random ports to transfer data and P2P system has its own complexity,distribution and variability.All of these facts make P2P detection a hard problem.The main purpose of this paper is to find a way to effectively detect P2P traffic on large scale NetFlow data.All existing P2P detection methods focus on packet data.It is very resource consuming to analyze huge amount of packets over the backbone,so almost all the existing research works can not be put into real use. The current P2P detection products collect packet content by connecting to network in series and use hardware for computing.They have the drawbacks of expensive to deploy,poor extensibility and privacy invasion problem.In this paper, we use NetFlow data for P2P detection and thus we can overcome the above problems.NetFlow data is aggregation and statistics of the packet information.It keeps the important information which indicates the traffic characteristics and makes the data volume smaller.Furthermore,NetFlow technique has been widely used among operators as an industry standard.The main contributions of this paper are:1) Got a series of P2P traffic characteristics according to the way P2P protocols run.For each characteristic,verified its usefulness for differentiating P2P and non-P2P traffic.Chose useful characteristics for detection according to the experimental result.2) Designed a P2P traffic detection algorithm for NetFlow data.The algorithm logically organized the useful characteristics chosen in 1) and made the detection more effective.3) Implemented a P2P traffic detection system INFOPAD based on the algorithm in 2).The system uses database for its data storage and uses SQL queries to implement the detection algorithm,which effectively solves the problem of storing and computing large volume of data.Each detection rule forms an independent module in the system.New rule can be easily integrated into the system as a new module.The system architecture is open and scalable.4) Used real NetFlow data collected from the routers of Shanghai Telecom to test the system.Verified the detection result according to the DPI report from Shanghai Telecom.It is shown that the detection algorithm of INFOPAD system achieves a high accuracy and the system has a satisfactory performance as an offline anaylsis procedure.The detection system implemented in this paper is well applied to P2P traffic detection on backbone networks.The system receives large volume of NetFlow data coming from routers,analyzes the data offline and submits a P2P traffic report at the end.The system has been used in daily network management in Shanghai Telecom.Comparing to DPI products which were already deployed before,our system can achieve almost the same accuracy level.However,it is cheaper to deploy and it is more convenient to maintain and update the detection algorithm by using our system.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2010年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络