节点文献

P2P流媒体识别方法的研究

The Research on Identification of P2P Streaming Traffic

【作者】 周丽娟

【导师】 李之棠;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2008, 博士

【摘要】 随着P2P技术应用的不断扩展,特别是文件共享和流媒体业务不断壮大,P2P网络本身潜在的安全问题和对各种资源,特别是带宽资源的滥用,已经受到各个网络运营商的高度重视。因此,实现分类、标识和控制P2P流量越来越成为企业、网络运营商急需解决的问题。目前,对P2P应用流量的识别与检测已形成了成熟产品的技术主要是基于深层数据包特征码匹配。任何基于程序性固定特征匹配的识别方法随着软件的更新或加密而极易失效,只有能体现应用流量本质的行为模式识别才是最终解决办法。但目前针对P2P流媒体流量行为特征的研究却几乎没有,而其应用软件的更新及流行速度却远较其它类型的P2P软件要迅速和广泛得多,从而使原本在P2P网络中就已出现的安全及盗版问题变得更加严重和难以控制。因此必须开展基于P2P流媒体流量行为特征模式的识别技术研究,以实现对未知应用和加密后流量的识别,这是具有非常重要理论意义和实用价值的研究。根据大量相关性研究结论和对P2P系统网络各种特征的分析,对P2P网络本质特征研究表明:P2P系统的网络区别于其它系统最本质的特征在于它的动态特性,并提出可将P2P网络的各种动态特征归纳为节点扰动和资源暂存两个方面的分类观点。从节点扰动性的角度研究适用于各类型P2P流量的识别算法CSI。由于发现P2P网络中节点扰动会引起随机访问节点时成功建立连接概率较低的现象。CSI算法通过计算一段时间间隔内单节点向外发出连接请求且连接成功的概率,并使用核密度估计方法对计算出的概率序列进行数值分布规律及时间变化趋势分析,然后根据分析结果对流量加以识别。由节点在线时间概率分布模型推导出网络中已有节点的继续存活时间的概率分布模型。由该模型,对集中式和DHT结构式两种典型架构的P2P网络任意节点访问成功概率的研究表明由于规模巨大和节点扰动,在这两种类型网络中都难以实现较高的成功率。此结论为CSI算法的建立提供了很好的理论证明。具体流量识别实验也证明CSI算法能很好的适用于各种类型的P2P流量。根据对P2P流媒体流量具有两层性特点(P2P特征+流媒体应用导致的新特征)的分析,提出一种对流媒体流量先进行P2P特征识别,再对识别出的P2P流量进行流媒体特征识别的层次识别模型。这种层次识别思路简化了识别算法的设计,且能大大减轻识别开销。P2P直播系统网络具有资源暂存性极大的特点,因此通过计算节点的平均BM信息比可有效的实现其流量的识别,算法IRI正是基于此方法提出的。流媒体传输和播放需要有严格的实时性、时序性以及连续性的特点使直播系统具有了高频度周期性的信息交换以及下载调度的机制,以此来实现节点间最大限度的资源共享。直播节点会通过频繁交换资源结构信息(Buffer Map)来克服资源快速过期引起的资源访问失败。由此提出P2P直播流量应具有BM信息包高频传输的特征并提出相应的测量模型。进一步通过对节点资源的存活时间建立概率分布模型,推导出可用于识别直播流量的阈值。实际实验的结果也证明了IRI算法具有漏判率很低,而误判率略高的特点,结合对流速大小和流波形平稳度的考量后可有效地消除误判。同样可从资源暂存性的角度对另一种P2P流媒体应用的流量—点播的识别算法展开研究。由于P2P点播系统中的资源暂存性和实时顺序播放的要求,导致其频繁周期性调度下载机制没有改变,由此一种通过测量节点的断点调度一致性来进行流量识别的算法BSI被提出。由于硬盘存储和CDN服务器的加入,点播系统的节点资源暂存性较直播系统有所改善,但仍为其重要特征之一。加上流媒体传输的实时连续性要求,直播系统中频繁周期性的调度下载机制并没有改变,但相对周期增大,因此针对直播流量的识别技术不再适合点播。而点播系统对高交互性操作的要求,使得周期内下载从直播的固定一次调度模式变为可多次随机调度的机动模式,系统以此提高下载的反应速度。而频繁周期性下载会造成数据流传输过程中的断流现象,对于流媒体系统,断流后数据的继续传输几乎都是由调度下载的行为引起的。而点播系统的灵活调度模式使得它的调度行为变得比直播系统中的要更为明显和易于测量(调度控制包的负载较大)。相应测量模型和阈值被提出,并通过最大熵方法对阈值的可用性进行证明。实验结果证明BSI算法不仅适用于点播流量,对直播流量也同样有良好的识别效率。

【Abstract】 With the growing popularity of the P2P technology, especially the expansion of file-sharing and streaming applications, the potential security threats posed by P2P technologies and the resource abuse by P2P applications concern Internet Service Providers (ISPs). For traffic-specific network engineering, ISPs must be able to classify Internet traffic, which highlights the need for P2P traffic detection. At present, commercial products for identifying P2P traffic are mainly based on application-layer signature matching. Any approach based on software-special characteristic matching will easily become useless along with software renewal or while payloads are encrypted, and can keep never outdated only according to nature-behavior patterns. To this day, there is none of studies on behavior patterns of P2P-streaming peer while the application spreads and softwares are evolving more rapidly than others, it makes the problems of security and copyright infringement more severe and uncontrollable than other systems. It is important and worthy of challenging for the Internet research community to acquire an in-depth understanding of the delivery of P2P streaming, particularly for the delivery architectures that hold the greatest promise for broad deployment in the near future so that the knowledge can be used to detect any unknown or encrypted traffic.Supported by many related works and the analysis of main characteristics of P2P system , a conclusion can be gained that dynamic behavior characterics, in theory, the dynamic nature of a P2P system distinguishes it from any traditional systems. As far as we know, it is the first time all dynamic characters of P2P system are reduced to two aspects: churn of peers and resource evanescence.From the perspective of peers churn an identification algorithm applied to all types of P2P traffic is presented. In P2P network , the dynamics of peer participation will cause low probability of success of connection establishment when peers are accessed randomly. Based on it, CSI algorithm is proposed which can identify a P2P peer by calculating the its success probability to contact with other peers within a period of time, and then analyzing the probability serial calculated in time order with Kernel Density Estimation aiding in understanding the most potential value of probability. Armed with the churn model and the joint lifetime distribution of the users of P2P system, the residual lifetime distribution of a randomly selected alive peer in the network is obtained. The contact-success probability of peers in structured P2P systems using distributed hash tables,also in centralized systems, is calculated according to the residual lifetime distribution. Both the not-so-high results consolidate the idea that is impossible for P2P system to make every connecting between peers successful due to large-scale network and churn. And the results of experiments also conform the effectiveness of CSI.The theory that P2P streaming traffic can be charactered in two levels: P2P and streaming media. It conducts a two-level traffic classification model of P2P streaming which filters all traffic with P2P-level characters and then identifying streaming traffic within the P2P traffic filtered in the first step . That is obvious this model can reach good performance for its simpleness and small-cost.As resources in P2P live network with very temporary nature, so through the average percentage of BM information receiving a live peer can be detected effectively, and IRI algorithm is proposed. Since the transmission of streaming media both need a high standard of real-time, as well as the timing of the continuity, so that the P2P live system deploys a high-frequent and cyclical download scheduling and information exchange mechanisms to achieve the inter-peer maximum resource sharing. Through the frequent exchange of structure of resources in their buffer , live peers can overcome the failures caused by the rapid rise of expired resources. From this, a character that Buffer Map informations should be transmitted continually in live traffic is revealed and followed by the measurement model of it. Further, the residual lifetime distribution of any randomly selected resource in peer’ s buffer is derived which can be used to deduce a threshold of the identification of live traffic . The actual results of experiments show that IRI algorithm has a very small rate of missing detection, and samely a slightly higher rate of mistaken detection which can be effectively eliminated by IRI combined with the considerations of the bit rate and waveform of flow . Similarly, from the perspective of resource evanescence the research of the flow-based traffic detection techniques of the other application—media-on-demand(MoD) can be conducted . As a result of resource evanescence and the requirement for real-time and smooth playback, the frequent periodic scheduling mechanism to download must be needed by P2P-based MoD systems. BSI algorithm is proposed which classifies traffic into MoD and no-MoD by measuring the consistency of scheduling at breakpoints in the downloading process of a peer. As hard disk-storage and CDN servers are deployed in MoD systems, resources stay for a longer time than live system, but resource evanescence still is one of the most important characters. In addition to this, the requirement for real-time and continuity playback, the download scheduling and information exchange mechanisms also keep frequent as live system ,however, with a relative increasing cycle, so the detecting technology for live traffic is no longer suitable. In MoD system, the demand for high interactive operations with users makes the downloading mode within a period from one-time-scheduling of live system to multiple-random-scheduling through which peers can be responded more rapidly. The frequent and periodic mode of downloading is the key reason for considerable breakpionts in the process of data transmission. Data transmission following every breakpoint almost is triggered by scheduling which is more easily to measure than live system with bigger-payload packets to instruct downloading. The correspoding measurement model is proposd ,also with the threshold for classification whose effectiveness proved by Maximum Entrop Method. Experimental results show that BSI algorithm is not only applicable to MoD but also live traffic.

  • 【分类号】TP393.08
  • 【被引频次】26
  • 【下载频次】1008
  • 攻读期成果
节点文献中: