节点文献

云中心网络流量分类方法研究

Research on Network Traffic Classification Method in Cloud Center

【作者】 李勋章

【导师】 王勇;

【作者基本信息】 桂林电子科技大学 , 软件工程, 2019, 硕士

【摘要】 近几年,随着云技术的高速发展,各大企业也都纷纷建立属于自己的数据云中心。在云中心的环境下,应用行为和应用流量的数据规模也在逐步增长,怎样从云中心的网络流量中挖掘出有价值的信息已成为各大企业追求的目标,同时,对云中心的网络流量进行分类也是实现云安全和云管理的关键。然而由于云环境下的网络流量数据规模大、应用种类繁多,利用传统的网络流量分类方法对云中心的流量进行分类,不仅无法确保分类准确率也无法满足实时性要求。如何兼顾网络流量分类的准确率和实时性是云中心网络流量分类的一个技术难点,对此论文给出了一种基于CDH(Cloudera’s Distribution Including Apache Hadoop)平台的网络流量分类方法和基于随机森林的网络流量分类方法。论文的主要工作及创新点如下:(1)针对校园网网络环境下的Internet流量分类要求,本文给出一种云环境下基于CDH平台的网络流量分类方法,构建了CDH大数据平台,通过网络协议数据分析工具抓取实际的网络流量,提出一种基于模式匹配的网络流量分类算法PM,然后利用大数据实时计算框架Spark Streaming对PM算法进行并行化,实现实时网络流量分类。与传统分类方法相比,该方法在流量分类效率和分类精度上都有所提升,所提出的模式匹配算法PM,不仅可以对离线的网络流量进行分类,还可以对实时的网络流量进行分类,为实现实时网络流量分类提供了思路。(2)针对Ceph云存储系统数据优化分布需要,给出一种基于流统计特征的存储节点之间流量分类方法,该方法利用wireshark抓包软件抓取Ceph云存储系统中实际节点之间的流量,并对抓取的流量进行流特征统计分析,选用包大小、包个数、流的持续时间三个组合特征,利用随机森林算法实现对网络流量的分类。实验结果表明,所选的组合特征结合随机森林算法能够很好的将Ceph云存储系统节点之间的流量进行分类。

【Abstract】 In recent years,with the rapid development of cloud technology,major enterprises have also established their own data cloud centers.Under the cloud center environment,the data scale of application behavior and application traffic is also gradually increasing.How to extract valuable information from the network traffic of the cloud center has become the goal which pursued by major enterprises,and at the same time,the network traffic classification of the cloud center is also the key to cloud security and cloud management.However,due to the large scale of network traffic data and the variety of applications in the cloud environment,using the traditional network traffic classification method to classify the traffic of the cloud center,which cannot ensure the accuracy of classification and the real-time requirements.How to balance the accuracy and real-time performance of network traffic classification is a technical difficulty about cloud data network traffic classification.This thesis presents a network traffic classification method based on CDH(Cloudera’s Distribution Including Apache Hadoop)platform and a network traffic classification method based on random forest.The main work and innovations of the thesis are as follows:(1)According to the Internet traffic classification requirements under the campus network environment,this paper presents a CDH platform-based network traffic classification method under the cloud environment,builds a CDH big data platform,and captures the actual network traffic with the data analysis tool of network protocol.And a network traffic classification algorithm PM based on pattern matching is proposed,then the PM algorithm is parallelized by the big data real-time computing framework Spark Streaming to realize real-time network traffic classification.Compared with the traditional classification method,this proposed method has improved traffic classification efficiency and classification accuracy.The proposed pattern matching algorithm PM can not only classify offline network traffic,but also classify real-time network traffic.It provides ideas for real-time network traffic classification.(2)Aiming at the need of data distribution optimization of Ceph cloud storage system,a traffic classification method based on flow statistics is proposed.This method uses wireshark packet capture software to capture the actual traffic between nodes in the Ceph cloud storage system.The flow characteristics of the captured traffic are statistically analyzed.Three combination features of packet size,number of packets,and duration of the stream are selected,and uses the random forest algorithm to classify the network traffic.The experiment results show that the selected combination features combined with the random forest algorithm can classify the traffic efficiently between the nodes of the Ceph cloud storage system.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络