节点文献

网络流量识别关键技术研究

Research on the Key Technologies of Network Traffic Classification

【作者】 林冠洲

【导师】 袁东风;

【作者基本信息】 北京邮电大学 , 信息安全, 2011, 博士

【摘要】 随着网络基础设施的持续建设和网络终端技术的发展,国内网络用户数量和网络应用规模快速增长。国家对于第三代移动通信技术、三网融合工程和物联网等新兴技术领域的政策扶持进一步加速了网络信息化的发展。网络应用的发展加快了产业融合,提升了社会运行效率,丰富了人们的精神生活,但是也带来了网络流量急剧膨胀、网络信息安全隐患加大和网络传播内容监管困难等多方面的问题。网络运营商、企业和政府监管部门对于网络流量的监督管理需求不断加大,而精确快速的网络流量识别能力是解决这些问题的首要前提。本文旨在应对使用加密传输、动态端口传输等新兴的反识别技术的挑战,在分析现有网络流量识别技术的优缺点和关键技术点的基础上,深入研究网络流量识别方法、识别特征自动提取方法和网络流量识别管理系统的关键技术,从而有助于合理控制网络带宽消耗,提供差异化网络服务,有助于有效监管网络传播内容,防范网络安全威胁。本文的主要内容包括:(1)基于深度报文检测的网络流量识别方法具有识别准确率高、识别效率快的优点,而识别规则的准确性直接影响到识别结果。针对现有依靠人工提取识别规则无法满足众多网络应用软件更新改版所带来的识别规则更新的需求,本文提出采用改进的基于PrefixSpan序列模式挖掘算法的网络流量识别特征自动提取方法,在序列连续属性和偏移属性的约束下,通过双层迭代方式从同一网络协议或网络应用软件生成的多组网络流量中自动提取出共有的网络流量识别特征。该算法挖掘出网络流量中存在的全部连续频繁子序列集合,并通过偏移约束有效控制频繁序列的规模。实验表明,通过合理设定算法参数,可以得到有效的高精度的网络流量识别特征。(2)基于半监督学习的网络流量识别方法能够有效识别未知网络流量,并且易于将聚类生成结果与实际网络应用类型进行匹配。现有基于半监督学习的网络流量识别方法采用的基于K均值的聚类算法饱受初始簇中心选择结果的影响。本文改进了原有基于半监督学习的网络流量识别方法中随机选择K均值算法初始簇中心的模式,采用基于己标记数据对象和贪心算法的原理来选择初始簇中心,对聚类结果依据最大似然估计与实际的网络应用类型相匹配,从而加速了聚类算法的收敛速度。实验结果表明,相比于原有算法,改进的算法在识别结果的总体准确率和平方误差和指标上均具有优势。(3)现有的网络流量识别控制系统架构大多采用单一的流量识别方法,在追求识别准确率和效率与识别未知网络流量两个方面存在矛盾。本文改进了已有的系统架构,提出基于深度报文检测识别方法和基于数据挖掘识别方法相结合的系统架构。架构采用独立识别、以优先级为标准统一判定的模式进行网络流量识别。同时,架构中加入了网络流量识别规则自动提取模块和网络流量识别模型训练模块,利用抽样算法选取部分网络流量识别结果进行自学习来自动更新维护识别规则库和基于数据挖掘算法的识别方法中使用的网络流量识别模型。架构支持多种流量控制手段和部署方式。(4)对于网络流量进行识别的目的是为了对特殊流量进行管理控制,而限制传输速度是其中的主要需求之一。已有的网络流量旁路限速方法无论是采用发送伪造的干扰报文的手段还是网络设备联动控制的手段均存在局限性。本文提出基于TCP协议头部格式中滑动窗口字段的网络流量旁路限速方法。方法依照TCP协议规定的传输流程,通过发送带有伪造滑动窗口值的干扰数据包,使得旁路部署的网络流量识别控制设备可以精确控制采用TCP协议传输的数据流的传输速率。另外,本文提出网络流量自身成分比例因素对于流量识别控制系统吞吐量的影响评估模型,从数据流和数据字节数两个方面给出评估公式,用于量化得出系统在给定的网络流量构成比例下的实际解析能力,从而在进行系统性能评估时减少系统性能的设计冗余。

【Abstract】 With the constant development of network infrastructure and network terminal technology, the number of netizen and network application in China is increasing raplidly. Informatization construction profits from the government support in the third generation of mobile technology, triple play and networking. All of these efforts accelerate industry convergence and enrich people’s spiritual and cultural life. However, administrative department faces new problems, such as the demand of different quality of service by Internet Service Providers, the threat of network and information security, and the necessity of sex and violent content regulation. The ability to classify network traffic accurately and efficiently is the key point to all of these questions.In this paper, the advantages and drawbacks of main network traffic classification algorithms are concluded. In order to face the challenges from the anti-identification technologies, including using random port or encryption in transmit, network traffic classification algorithm based on semi-supervised clustering, automatically signatures mining algorithm and network traffic management system are researched in this paper. The main contributions of this paper are as follow:(1) The approach based on payload signatures presents more accurately and efficiently than other algorithms in network traffic classification. The performance of payload-based approach heavily depends on abundant and real-time signatures database. Existing approach used to dig out payload signatures involves a manual process which is time-consulting and complicated. In this paper, a novel payload signatures mining algorithm based on PrefixSpan is proposed to automatically extract signatures from special network application traffic. The mining process with continuous sequential pattern restriction and offset constraint in payload significantly reduces the size of final signatures database. The algorithm mines the complete set of signatures with offset constraint and outperforms Apriori-based algorithm. Moreover, the experimental results show high precision and low error rate using these mined features in network traffic classification.(2) The diminished accuracy of port-based classification and the incapability in unknown traffic indentification of payload-based classification motivate the use of transport layer statistics for network traffic classification. The approaches based on semi-supervised clustering can identify unknown network traffic and map unlabeled clusters to network applications easily. A novel semi-supervised clustering approach based on improved K-Means clustering algorithm is proposed in this paper to partition a training network flows set that contains a huge number of unlabeled flows and scarce labeled flows. Greedy algorithm and labeled flows are used to initialize clusters centers instead of the random selection of the cluster centers. Maximum likelihood estimation is selected to construct a mapping from the clusters to the predefined traffic classes set. The experimental results show that both the overall accuracy and SSE value of our algorithm present better than those based on normal K-Means algorithm.(3) Only one network traffic classification approach is employed in almost every existing network traffic management system. There is a contradiction between unknown traffic identification and the accuracy of classification results. A novel framework using both payload-based algorithm and machine learning algorithm is constructed in this paper. The results generated by each algorithm will be estimated centralized under special standard. Meanwhile, modules for automatically signatures mining and self-learning in machine learning algorithm are adopted in the framework in order to update the system timely. The framework also supports various network traffic control methods and can be deployed in path or bypass pattern. (4) Network speed restriction is the chief demand for network management. After identifying network traffic, network management system in bypass pattern can send manual constructed packets or notice other network security systems to manage special network flows. However, these methods are limited because of the complexity or effect. A novel approach is proposed in this paper using sliding window field of TCP protocol to restrict network speed in bypass deployment pattern. Packets with constructed sliding window field are sent to control network flows speed in byte unit. In addition, performance evaluation model for network traffic classification is constructed in this paper. Formulas in byte unit and flow unit can be adopted to calculate data throughput of network management system in performance evaluation, therefore redundancy will be reduced.

  • 【分类号】TP393.06
  • 【被引频次】11
  • 【下载频次】1238
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络