节点文献

基于拓扑·流量挖掘的网络态势感知技术研究

Research on Cyberspace Situational Awareness Technology Based on Topology and Traffic Mining

【作者】 卓莹

【导师】 龚正虎;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 博士

【摘要】 网络态势感知是指在大规模网络环境中,对能够引起网络态势发生变化的要素进行获取、理解、评估、显示以及对未来发展趋势的预测。作为网络管理发展的必然方向,网络态势感知能够融合多源多属性信息,对由各种网络设备运行状况、网络行为以及用户行为等因素所构成的整个网络的当前状态和变化趋势进行评估和预测,并提供决策支持。有关网络态势感知的研究才刚刚起步,主要集中在安全领域,没有体现态势宏观性和整体性的特点;采用的方法以层次结构、权重分析为主,缺少理论依据;而且多数研究停留在数据层面上,没有上升到态势的高度,没有实现从数据到信息再到知识的抽象。针对网络态势感知中的典型问题与共性需求,深入研究了关键技术以及应用部署的发展现状,提出了基于拓扑·流量挖掘的网络态势感知模型,重点研究了面向态势模式划分的网络数据流聚类算法、基于粗集分析的态势评估方法以及基于广义回归神经网络的态势预测方法,并在此基础上设计实现了一个原型系统。主要贡献包括以下几个方面:深入分析了流量分析的不足以及数据挖掘的优势,在此基础上提出了基于拓扑·流量挖掘的网络态势感知模型——TTM(TopologyTrafficMining)模型。TTM模型明确了网络态势感知的功能以及功能的划分与组织,定义了数据结构和功能函数,并且给出了建模过程和感知过程。TTM模型突破了安全态势的局限,以网络流量数据和拓扑数据作为态势感知的数据来源,综合考虑各种影响网络态势的因素作为态势因子建立指标体系,以流量挖掘和拓扑推理为基本思想,提供更高层次、更加抽象的综合态势,实现对网络全局态势的评估与展现,充分体现了态势整体性和宏观性的特点。此外,TTM模型引入数据挖掘的思想,具备获取知识、揭示规律的能力,既能够全面揭示网络中存在的各种异常事件,又有理论支持,科学客观。针对态势模式划分缺少先验知识的现状,确定聚类作为流量挖掘的手段。在分析现有聚类算法和流量数据特点的基础上,提出了面向态势模式划分的网络数据流聚类算法——NetStream。算法在数据空间网格划分和态势因子选择的基础上,进行全空间聚类,通过合并相连密集网格形成簇;然后对不满足密度阈值的簇采用自顶向下的策略、兼顾密度与维度双重标准进行子空间聚类,搜索最优投影簇;并且通过Chernoff界判断概念漂移,采用双窗调整策略自适应调整窗口大小和更新间隔,增量更新聚类结果。NetStream是一种高速子空间聚类算法,能够处理高维、混合属性、带有突发特性的网络数据,并且满足一遍扫描、顺序访问、有限内存、可扩展、易理解、噪声不敏感等多种要求;更重要的是,自顶向下策略充分利用网络突发性在数据分布上产生的特点,能够发现不同维度的不同子空间中的投影簇,实现快速子空间聚类;而基于Chernoff界的概念漂移检测能够发现网络突发行为,结合跳动窗口双窗调整增量更新策略,实现数据流在线聚类及动态维护。针对态势评估不够科学客观的现状,提出了基于粗集分析的态势评估方法——RSSA(Situation Assessment based on Rough Set Analysis)。RSSA在态势模式划分的基础上,通过粗集分析自动生成网元态势评估规则;进一步考虑态势模式发生的频率以及时序变化规律,制定评估规则调整策略;同时以容量理论为依据,综合分析网元的拓扑贡献和传输能力,确定网元的权值;最后融合各个网元的态势及权重,完成全网态势评估。RSSA一方面借助粗集分析,将知识的表达、学习和分析纳入统一的框架之中,兼具表达、学习与分类能力,能够从模式中发现隐含知识、揭示潜在规律并转化为逻辑规则,而且无须任何先验信息,科学客观。另一方面通过图论分析综合考虑网络拓扑结构、网元传输能力对全网态势的影响,融合拓扑数据和流量数据,真正实现全局视角的网络态势评估。针对非线性系统预测的问题,将态势预测看作时间序列进行分析,提出了基于广义回归神经网络的态势预测方法——GRNNSF(Situation Forecast based onGeneralized Regression Neural Network)。GRNNSF根据历史数据训练广义回归神经网络,自适应选择网络参数,建立预测模型,并且随数据的到达动态更新预测模型。GRNNSF学习速度快,预测精度高,非线性映射能力强,同时具有结构自适应确定、输出与初始权值无关等特点,在逼近能力、分类能力和学习速度上较反向传播网络和径向基网络有着较强的优势,并且在样本数据缺乏时,预测效果也比较好。基于上述关键技术的研究,设计并实现了网络态势管理原型系统(NetworkSituation Management System, NSMS)。原型系统集成了拓扑发现和流量采集两大单元网管功能,提出了多视图、超体积态势可视化方案MVHV(multi-view,hypervolume),实现了网络数据流聚类算法NetStream、态势评估方法RSSA以及态势预测方法GRNNSF,验证了网络态势感知模型TTM。本文是对网络态势感知的一次有益探索,研究成果对于促进综合网络态势管理具有良好的理论价值和实践意义。本文所作工作已在承研的预研课题和实际工程项目中得到了应用。

【Abstract】 CyberspaceSituationalAwareness(CSA)referstotheacquirement,comprehension,assessment, visualization of the factors which can bring changes in network situationand the forecast of the development trend in the large-scale network. As thedevelopmentdirectionofnetworkmanagementinthefuture,CSAcanfusemulti-sourceand multi-attribute information, assess and forecast the current state and trend of thewhole-network which is composed of the operating status of various networkequipments, network behaviors, user behaviors and other situation factors, and providethe decision support. Currently the research on CSAis just at thebeginning. There aremany problems to be solved: the current research mainly focuses on security, whichcouldn’treflectthesituationcharacteristicsofintegrityandmacroscopy;themainstreamassessment methods are based on hierarchical structure or weight function, which lackthe theoretical basis; most researches remain at the data level, not up to the situationlevel,whichcan’trealizetheabstractfromdatatoinformationagaintoknowledge.According to the typical problems and common requirements of CSA, we studiedthe current key technologies and the application deployment, proposed a CSA model,and mainly researched the network data stream clustering algorithm, situationassessment method and situation forecast method. We also designed and implemented aprototype system to validate our work. The major contributions of this thesis are asfollowing:Considering the shortcomingof traffic analysis and the advantage of data mining,we proposed aCyberspace Situational Awareness model based on Topologyand TrafficMining (TTM). TTM model specifies the CSA functions as well as their division andorganization, defines the data structure, and gives the modeling process and awarenessprocess. The basic idea of TTM model integrates traffic mining and topologyinference,so TTM breaks through the limitations of the security situation, and takes the networktraffic and topology as data source to establish the index system including varioussituation factors which can affect the network situation. TTM provides a higher-levelmore-abstract comprehensive situation, realizes the whole-network assessment andvisualization,and fullyreflects the situationcharacteristics ofintegrityandmacroscopy.In addition, introducing the data mining, TTM is theoretical, scientific and objectivewith the capability of knowledge acquisition, law discovery and known/unknownanomaliesdetection.Aiming at the lack of prior knowledge of situation pattern, clustering wasdetermined as the means oftraffic mining.Analyzing the existing clusteringalgorithmsand thecharacteristics of traffic data, we put forward a network data stream clusteringalgorithm for situation pattern partition -- NetStream. On the basis of clustering space grid partition and situation factor selection, NetStreamfirst merges the connected gridstoformclustersinfull-dimensional space;andthensearchesdenseprojectionclustersinthe clusters unsatisfied density threshold by means of top-down subspace clustering;finallydetects concept drift based onChernoff Bound,dynamically adjusts the windowsize and update interval of jumping windows, and incrementally modifies clusteringmodel. NetStream is a fast subspace clustering algorithm, which can deal withhigh-dimension, burst nature, heterogeneous attributes data and satisfy all of therequirements including: one-pass,ordinal access input data, limitedmemory,scalability,comprehensibility, insensitivity to noise and so on. More importantly, the top-downstrategy, which realizes the fast subspace clustering, takes full advantage of the datadistribution characteristic caused by the burst nature of network, and can find theprojectionclusters with different dimensionalityin different subspaces; theconcept driftdetection based on Chernoff Bound, combining with incremental update strategy, canfind the network burst behavior and realize the online clustering and dynamicalmaintenanceofdatastream.To enhance the theoretical basis of situation assessment, we proposed a SituationAssessment method based on Rough Set Analysis (RSSA). On the basis of situationpattern partition,RSSA generates the situation assessmentrules of the network elementsautomatically through Rough Set analysis; further designs the adjustment strategy forassessmentrules according to the appearancefrequency of situation pattern; meanwhileanalyzesthetopologycontributionandtransmissioncapacityofthenetworkelementstodeterminetheirweights basedonthecapacitynetworktheory; finally fuses thesituationand weight of each network element and completes the whole-network situationassessment. On one hand, with the aid of Rough Set analysis, RSSA integrates theknowledge expression, learning and analysis into a uniform framework, and has theabilityof expression, learning and classification. RSSA has superiorities at the aspect ofdiscovering connotative knowledge, revealing potential law and designing logical rulesfrom massivehistoricaldataorcases. RSSA does not needanypriorinformation,so itisscientificandobjective.Ontheotherhand,withtheaidofGraph Theoryanalysis,RSSAintegrates topology and traffic data, analyzes the effect of network topology structureand network element transmission capacity on the whole-network situationcomprehensively, and realizes the network situation assessment from a globalperspective.Aiming at the problem of nonlinear system forecast, we proposed a SituationForecast method based on Generalized Regression Neural Network (GRNNSF).GRNNSF regards situation forecast as the time series analysis, trains GRNN usinghistorical data, selects network parameters adaptively, and updates the forecast modeldynamically with the arrival of new data. GRNNSF is fast, accuracy, and hassuperiorities in approximation ability, classification ability and learning speed over Back-Propagation Network or Radial Basis Function Network. Even if the sample dataislacking,theforecastresultisalsogood.Tovalidatethesekeytechnologiesdescribedupon,wedesignedandimplementedanetwork situation management prototype system -- NSMS. NSMS integrates twonetwork management functions: topology discovery and traffic collection,puts forwarda multi-view hypervolume visualization scheme, implements NetStream, RSSA andGRNNSF,anddemonstrates TTMmodel.Our research is a beneficial exploration of Cyberspace Situational Awareness. Itprovides essential support to network situation management environment.The researchis valuable to facilitate network management and has been integrated into our actualproject.

节点文献中: