节点文献

动态复杂网络中的异常检测问题的研究

Research on Anomaly Detection in Dynamic Complex Networks

【作者】 孟啸

【导师】 高宏;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2010, 硕士

【摘要】 复杂网络是具有某些共同结构特征的网络的统称。社会关系网络、文献引用网络、学术合作网络、Internet自治系统网络、Web Graph等这些来自不同领域的网络都是典型的复杂网络。复杂网络具有两个重要特性:大规模和动态性。这使得动态复杂网络的统计特征异常检测成为一个非常重要的问题。在网络管理、Web搜索引擎系统监控、复杂网络理论研究等应用中,动态复杂网络的异常检测都有着极其重要的现实意义。本文对动态复杂网络中两个问题进行了研究:1)给定复杂网络的一个时间序列,如何确认在什么时刻,网络的统计特征发生了异常; (2)给定发生异常的时刻,如何选出最能代表其网络统计特征的变化的区域。对于第一个问题,复杂网络时间序列的统计特征异常检测问题,我们给出了一个复杂网络统计特征的检测框架。利用在真实数据上观察到的行为(顶点数,边数的线性增长)以及复杂网络的研究成果(稠化幂律以及度数的幂律分布),我们给出了在这些统计特征之下的异常定义。利用线性回归模型,我们给出了全局特征异常检测的基本算法。在此基础上,我们提出了去除检测到的异常点,然后迭代计算线性回归参数,直到异常点的集合达到稳定的迭代式算法:iLRAD算法。真实数据集上的实验结果表明,通过较少的迭代次数,iLRAD算法就能够达到收敛。同时,iLRAD算法就能够显著的改进异常检测的效果。对于第二个问题,我们考虑了更一般化的版本,给定复杂网络时间上相继的两个快照,如何选出最能代表其网络统计特征的变化的区域。我们将该问题形式化的定义为一个组合优化问题:k-CR问题。以顶点统计特征为例,我们证明了该问题是一个NP-hard问题,不存在多项式时间的精确算法。我们给出了解决k-CR问题的两种近似算法:基于分数的Top-k Score算法与基于贪心策略的Greedy Remove算法。此外,我们给出求解k-CR问题最优解的分支限界算法,我们给出下界估计的策略并给出了证明。在真实与人工的数据集实验下,基于贪心的Greedy Remove算法表现出较好的实验近似比。

【Abstract】 Complex Network is a class of networks with certain common structural features,such as social network, citation network, coauthor network, AS Graph and Web Graph.Two important features of them are dynamics and large-scale, which make statisticalanomaly detection in dynamic complex network an extremely important problem. It canbe widely used in network management, web search monitoring, complex network theoryresearch. We solve two problems in thesis: (1) Given a time series of network, identifywhen occurs anomaly in the time series; (2) Given a time of anomaly, how to selectregions which best represents the change of network;For the first problem, we show a framework of anomaly detection in complex network.We definite anomalies under observation on real datasets(linear growth of vertex,edges), and research results (DPL and power law degree distribution), We give ananomaly detection algorithm with linear regression model. We propose an iterative algorithm,iLRAD algorithm further. The basic idea is to remove the detected outliers, fitthe line, get new outliers sets and do it again. The process stopped until the anomaliesset is stable. Experimental results on real datasets show that em iLRAD can achieveconvergence quickly and. significantly improve efficiency of anomaly detection.For the second problem, we consider the general version: Given two successivesnapshot of complex network, find regions which can best represent the changes of statisticalcharacteristics. We formalize the problem into an optimization problem, the k-CRproblem. Take the vertex statistical feature for an instance, we show the problem is NPhardwhich implies no algorithm can find the optimal solution in polynomial time. Wegive two approximation algorithms to solve the k-CR problem, the Top-k Score Algorithmand the Greedy Remove algorithm. The two algorithms are both linear time. Experimentalresults on real and synthetic datasets shows the greedy remove algorithm has agood approximation ratio. We also give a branch and bound algorithm to find the optimalsolutions. We show the strategy of estimate lower bound and make a sketch of proof.

节点文献中: