节点文献

数据流异常检测系统若干问题研究

Study of Data Stream Anomaly Detection System

【作者】 李人和

【导师】 周傲英;

【作者基本信息】 复旦大学 , 计算机软件与理论, 2008, 硕士

【摘要】 近年来,随着网络技术的不断发展,数据流作为一种新颖的数据传输方式在日常生活中有着越来越广泛的应用,并推动了传统的数据库管理系统(DataBase Management System)向数据流管理系统(Data Stream Management System)进行转变。与静态数据相比,数据流具有实时性,连续性和无限性的特点,这使得分析数据流的方式与已有的数据处理方式存在着较大区别。而在国民经济的各个领域中,数据流的分析处理技术都有着非常广泛的应用,因此,对于数据流相关问题的研究,已逐渐成为数据库方向研究的重点。上海电信在日常运营管理过程中,需要对不同层次上网络端口的流量数据进行分析,以实现对系统中的异常情况实时进行监测和处理,提高服务质量。为了应对这样的需求,我们提出了一个新颖的系统RealMon,该系统能够实时监测电信线网中的流量异常情况。在设计和实现RealMon系统的过程中,我们发现,不同链路的流量数据存在着相互关联的特性,并且,这些数据存在着较为严重的数据质量问题。针对这些问题,在本文中,我们首先提出了通过分析一组数据流中关联关系的变化来查找数据流异常的方法,并针对数据流质量问题,设计了对数据流进行实时清洗的模型,在此基础上,我们设计和实现了能够对电信网络流量数据进行实时分析和异常检测的系统RealMon,该系统结合了部分成熟的数据流分析算法,具有较高的实用性。我们通过模拟和真实环境中的实验验证了系统的有效性。本文的贡献和创新之处总结如下:1.在网络流量分析系统及证券交易系统中,不同数据流之间的关联性广泛存在,本文提出了通过分析不同数据流之间关联性的变化来查找异常的方法。该方法首先采用分段聚集近似法对数值型的数据流进行转换,然后使用改进的编辑距离来衡量这些数据流相互关联的程度,最后根据编辑距离的变化通过设定相应的阀值来检测多数据流中的异常情况。我们通过实验表明,该算法性能稳定,高效,能够准确地检测数据流中的异常。2.我们概括了数据流中常见的数据质量问题,在此基础上提出了数据流清洗框架,我们在这个框架中定义了数据流清洗的基本步骤。该框架具有较好的可扩展性,我们能够方便地在框架中更新模块,以解决新的数据质量问题。同时我们设计了能够实时处理数据缺失和规整数据时间属性的方法,并且通过一系列实验来验证了这些方法的有效性。3.我们设计并实现了的一个新颖的数据流异常检测系统RealMon,该系统能够准确地检测出电信网络中SNMP流量数据的异常情况。由于SNMP数据存在着较多数据质量问题,我们在设计过程中应用数据流清洗相关技术对流量数据进行实时清洗。该系统同时结合了数据流异常检测模块和数据流清洗模块,在已有的研究工作中尚属首次,具有较强的实际应用价值。如今,该系统已成功地在模拟环境中,对SNMP流量数据中的异常情况进行实时监测。综上所述,我们在本文中研究了数据流中的异常检测问题,提出了数据流清洗模型和通过分析多条数据流关联性来查找多数据流异常的方法,并根据这些工作的研究成果,设计和实现了对电信流量数据进行实时分析和异常检测的系统RealMon。我们的研究成果具有较高的理论价值和实际应用价值。

【Abstract】 With the rapid development of information technology, data stream which is a novel data structure has been widely used in our daily lives. Traditional databases have long been used for storing persistent data and querying those data offline. However, the past few years have witnessed an increasing amount of applications that produce data in the form of sequences. The online monitoring and analysis of data streams have been attracting increasingly attention in relevant area of database research.Nowadays, most ISP enterprises face the challenge of managing huge amount of network traffic data. In a telecom network, gathering and analyzing SNMP traffic data is one of most important method for administrators to manage network performance, find and solve network problems. In order to meet this requirement, we showcase RealMon, a real stream monitoring system aims at finding anomalies among thousands of network links. By the time we design and implement this system, we found that the data streams from telecom network are correlated with each other and those SNMP data contains a lot of data quality problems. Therefore, in this paper, we first put forward an algorithm to detect the outliers based on the change of correlation between streams and then we showcase a novel framework for data cleansing in real time. Based on these achievements, we demonstrate a real stream monitoring system, RealMon, which can analyze the SNMP data gathering from routers with heavy workload in online fashion. Our major contributions of this thesis include:1. A novel algorithm is proposed to detect the anomaly by continually monitoring the change of correlation between streams. It employs the method of Piecewise Aggregation Approximation to transform the raw data into character and finds the anomaly by calculating the Edit Distance between different streams. Extensive experiments are performed to verify the efficiency of our algorithm.2. The design of an extensible data stream cleaning framework is provided after we surveyed the common data stream quality problems. Our framework gains its extensibility by employing innovational modules so as to solve various problems separately. Some typical data cleaning algorithms are also implemented in this framework. 3. A data stream monitoring system, named RealMon, is implemented to detect anomalies among thousands of network links. Some renowned algorithms for data stream analysis are implemented in this system to monitor the huge amount of SNMP (Simple Network Management Protocol) messages, which are collected from routers in telecom backbone network. Some data cleansing algorithms are also integrated into the system to address the data quality problem among SNMP messages. The experiments show that the system could perform efficiently in the simulated environment.We believe our work is a good example of integrating theory with practice since we not only provide some key solution for anomaly detection and data cleansing, but also implement a novel system to detect anomalies among thousands of network links. Our work has great importance in data stream research area.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2010年 07期
  • 【分类号】TN915.06
  • 【被引频次】3
  • 【下载频次】208
节点文献中: 

本文链接的文献网络图示:

本文的引文网络