节点文献

超球体多类支持向量机及其在DDoS攻击检测中的应用

Hyper Sphere Multi-Class SVM and Its Applications on Detecting DDoS Attacks

【作者】 徐图

【导师】 何大可;

【作者基本信息】 西南交通大学 , 信息安全, 2008, 博士

【摘要】 分布式拒绝服务(DDoS)攻击通过操纵“僵尸网络”,向受害主机发起海量的垃圾请求,使受害主机完全超过工作负荷而无法响应正常用户的请求,达到拒绝服务的目的。由于“僵尸网络”是由分布在全球的有安全缺陷的主机组成,受到攻击者的幕后指挥,而且又可用虚假IP地址发起的攻击,因此很难通过IP包中的信息来发现真正的攻击者,在网络上发起DDoS攻击对攻击者而言相对比较安全,使得这种攻击对Internet安全造成了极大的威胁。为了遏制DDoS在互联网上泛滥,必须对DDoS的防御措施进行研究。若要有效防御DDoS攻击,首先需要准确地检测到DDoS。由于DDoS常常使用虚假IP地址,而TCP/IP协议并不对IP地址进行认证,因此无法识别哪些包使用了虚假IP地址,这就给检测DDoS带来了困难。DDoS检测已经成为网络安全的研究热点,已经提出了不少检测算法。这些算法的一个共同点是,寻找到DDoS攻击的某个数值特征,根据这个数值特征来反映攻击是否存在。一方面,由于网络流的随机性和复杂性以及攻击行为的多样性使得通过单一特征来检测DDoS的方法的可靠性受到质疑,它们容易将突发的大流量正常数据流也识别为攻击而造成误警率过高。为了准确地检测DDoS,必须使用多个属性进行检测。另一方面,为了有效地防御DDoS,要求检测环节能尽可能多地提供攻击流的信息,例如同时提供攻击强度、攻击方式和攻击协议的信息。综合考虑,一种可行的方法是采用多类模式识别的方法,根据攻击强度、攻击方式和攻击协议的不同,将攻击分为24种不同的类型,寻找到一组能区分各类攻击的特征向量,然后采集不同类型攻击的样本,对多类学习机进行训练;检测阶段,采集网络流的特征数据,送到训练好的学习机中进行检验,以获得的类标来判断是否有攻击发生以及相应的攻击强度、攻击方式和攻击协议的信息。支持向量机(SVM)是基于统计学习理论(SLT)的新型学习机,它集最大间隔超平面、Mercer核、凸二次规划、稀疏解和松弛变量等技术于一身,可以克服传统学习机的局部最小、维数灾难和过学习等问题,是一种性能良好的分类器。标准的SVM是个二分类器,若用SVM解决多类分类问题,需要将SVM扩展到多类。目前的主要思路是将多类问题转化为一系列的二分类问题,然后由多个SVM进行分类,例如常用的1-v-r和1-V-1分类器。这种方法可有效实现多分类功能,但因它是以间接的方式形成的分类能力,需要训练的SVM数量较多,所以它们的学习效率不高,不适合类别多、训练规模大的问题。因此,间接型多类SVM不适用于DDoS攻击检测。由于DDoS分类类别比较多,需要效率更高的直接型多类学习机。在前人工作的基础上,建立了直接型多分类器——超球体多类支持向量机(HSMC-SVM)的概念。建立超球的原则是,对某类样本,在超球半径尽可能小的情况下,包含该类样本尽可能多。为每类样本建立一个超球,N类样本就建立N个超球,在空间中形成像肥皂泡一样的分类结构。在判决时,测试样本离哪个超球最近,就属于那个超球代表的类,这就是HSMC-SVM的分类原理。它是以直接的方式形成分类能力,具有学习容量大、训练速度快、可扩展性强的优点。每个超球的确定相当于求解一个凸二次规划(QP)问题,根据训练SVM的SMO算法的思想,建立了HSMC-SVM的SMO训练算法,并给出了“二阶逼近”的工作集选择法,使得训练速度进一步提高。在加快训练速度方面,还采用了样本缩减和核矩阵缓存的策略。通过理论分析已经证明,HSMC-SVM的分类误差有界。实验表明,HSMC-SVM在训练和测试速度上较1v-r和1-v-1分类器有较大幅度提高,但分类精度略有下降。为了进一步提高训练速度和训练精度,将最小二乘法引入到HSMC-SVM中,提出了最小二乘超球支持向量机(LSHS-MCSVM)的概念。与HSMC-SVM相比,LSHS-MCSVM在目标函数中使用了二次函数,将不等式约束改为等式约束,并取消了乘子的取值限制,使得LSHS-MCSVM在乘子搜索和优化计算方面速度更快,从而加快了它的整体收敛速度。LSHS-MCSVM的训练仍可使用SMO算法,在“一阶逼近”和“二阶逼近”的工作集选择下,LSHS-MCSVM的收敛速度比基于经验的工作集选择法有进一步提高。由于都使用球形分类结构,LSHS-MCSVM与HSMC-SVM有类似的学习误差上界。数值实验表明,在不降低分类精度的情况下,LSHS-MCSVM比HSMC-SVM有更快的训练速度,在某些数据集上,LSHS-MCSVM还有更高的学习精度。为了区分不同类型的攻击,对DDoS攻击流进行分析,提取了9维相对值(RV)特征向量。分别将HSMC-SVM和LSHS-MCSVM用于DDoS攻击检测。实验表明,它们完全能区分不同类型的攻击,并能较准确地识别由真实的攻击工具发起的攻击。对不同类型的攻击,两种分类器都能比较准确地给出攻击的类标。根据识别出的类标,可以获得攻击强度、攻击协议和攻击方式的信息,为防御环节采取相应措施提供了依据。

【Abstract】 An attacker launches Distributed Denial of Service (DDoS) attacks by the BotNet, which send lots of garbage IP packets to the victim. Because the victim receives the garbage IP packets exceeding over that it can’t deal with, it will determine the services to the legitimate uses. BotNet is composed of the computers all over the Internet with security weakness. An attacker will be relative safe if he/she commands BotNet to start a DDoS attack with bogus source IP addresses. As a result, launching DDoS attack on Internet is so easy that it becomes a severe threat to the Internet security. So the researchers must find some measures to limit or stop DDoS attacks overflow on Internet. In order to defeat DDoS attacks efficiently, the first case is precisely detecting DDoS attacks. There are some troubles to detecting DDoS attacks, because DDoS attacks may exploit bogus source IP addresses and the TCP/IP protocol doesn’t implement authentications to these source IP addresses.For DDoS attacks detection is a research focus in the area of network security, many detection algorithms have been already proposed in recent years. These detecting algorithms have a common character that they identify DDoS attacks according to certain numeric feature of DDoS flow. However, the detecting effects of these algorithms with sole feature are doubtful because of the randomicity and complexity of the network flow and variety of attacks flow. They likely regard gusty normal data flow as attack, so their false positive rates are usually high. In a word, several features are required in order to detect DDoS precisely. On the other hand, for defeating DDoS attacks, the more information about DDoS should be distilled at detection phase, such as attacks intensity, attacks pattern and attacks protocol. Then defenders can deal with the DDoS attacks according to the information. Pattern recognition algorithm can be used to implement the detecting scheme. All attacks are classified into 24 categories according to attacks intensity, attacks pattern and attacks protocol. Then find a group of features to distinguish these attacks. Sample the attacks flow with the features to compose the training set and use it to train the multi-class classifiers. At the testing phase, sample the networks flow to form the test data and obtain its category label by trained classifier. Using the category label, one can judge whether attacks are present and the information about attacks intensity, attacks pattern and attacks protocol.Support Vector Machines (SVM) is a novel learning machines based on the Statistical Learning Theory (SLT). Concentrating several good technologies such as maximum margin hyper plane, Mercer kernel, convex quadratic programming, spare solutions and slack variables, SVM is a good learning machine, which can overcome the shortages of traditional classifiers—local minimization, curse of dimension and overfitting. Standard SVM is a binary classifier for pattern recognition. For learning a multi-class problem, SVM must be extended to multi-class classifier. Current the main idea to extend SVM to multi-class is to translate the multi-class problem into a series of binary class problems, and a SVM solves a binary class problem. For example, 1-v-r and 1-v-1 are the multi-class classifiers following this idea. They can carry out multi-class classifying capacity. However, because these classifiers are constructed by an indirect manner, too many SVMs are training at the training phase. As a result, their learning efficiency is low and they aren’t fit to the classifying problem with too many class categories and large scale training set, so indirect multi-class classifiers aren’t fit to the problem of DDoS detection.A higher efficient direct multi-class learning machine is necessary because DDoS detection has a large number of categories to classify. Following and extending former research, a novel direct multi-class classifier—Hyper Sphere Multi-Class Support Vector Machine (HSMC-SVM) is proposed in the paper. In a multi-class problem, one finds the minimum radius hyper sphere including the majority of the examples for every category of examples. N hyper spheres would be constructed for N classes of examples. All of the hyper spheres form a soap-bubble-shaped classification frame in the examples space. At testing phase, the testing point would belong to the class whose sphere is the closest to the point. Based on direct classifying principle, HSMC-SVM have some advantages than indirect classifiers, such as large learning capacity, fast training process and good expansibility. One will solve a convex QP problem to calculate a hyper sphere. For the SMO algorithm training SVM successfully, the SMO algorithm is proposed for training HSMC-SVM and second order information measure for working set selection. The two measures further enhance the training speed. Further, "shrink" and "caching" are also used to improve the training speed. Through theoretic analysis, it is proved that the classification error of HSMC-SVM is bound. Shown in our numeric experiments, HSMC-SVM has faster training and testing speed than 1-v-r and 1-v-1, but its learning precision is low than them.For improving training speed and learning precision again, least square measure is introduced to HSMC-SVM and form the new learning machine—Least Square Hyper Sphere Multi-Class SVM (LSHS-MCSVM). Comparing to HSMC-SVM, LSHS-MCSVM exploits second norm in object function, replaces the inequation constrains with equation constrains and gets rid of the limitation of Lagrange multipliers. These differences cause faster multipliers scanning and optimization calculation in LSHS-MCSVM, so it has faster convergence speed than HSMC-SVM. LSHS-MCSVM can also use SMO algorithm to train. Under working set selection of first order information and second order information, the training speed of LSHS-MCSVM is faster than empirical working set selection. For both of HSMC-SVM and LSHS-MCSVM are based on hyper sphere classification frame, they have similar theoretic error upper bound. The numeric experiments show that the training speed of LSHS-MCSVM is faster than that of HSMC-SVM on same learning precision. Moreover, on certain datasets, the learning precision of LSHS-MCSVM is even higher than HSMC-SVM.In order to detect DDoS attacks with HSMC-SVM and LSHS-MCSVM, a 9-dimension relative value (RV) feature vector is distilled via analyzing DDoS attacks flow. The numeric experiments show that the RV features can distinguish all kinds of the attacks precisely and efficiently identify the DDoS attacks launching by real attack tools. The experiment results are the two classifiers can identify the class label of the real DDoS attacks. According to the class label, the net administrators can obtain the attacks intensity, attacks pattern and attacks protocol when some attacks are present, which are important information for defeating the DDoS attacks successfully.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络