节点文献

计算机病毒智能检测技术研究

Study on the Intelligent Methods for Detection of Computer Viruses

【作者】 张波云

【导师】 殷建平;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2007, 博士

【摘要】 日益泛滥的病毒问题已成为信息安全的最严重威胁之一。由于加密和变形病毒的出现使得传统的特征扫描法不再有效,研究新的反病毒方法刻不容缓。本文以统计学习理论为指导,对病毒的自动检测技术进行了深入研究,取得了以下几个方面的研究成果:1.提出了基于多重朴素贝叶斯算法的病毒动态检测框架。检测系统在虚拟机中对可疑程序的行为进行监控,记录程序在运行时与操作系统交互过程中所调用的API函数相关信息,从中抽取特征输入检测器,检测器对样本集进行学习后即可用于对可疑程序进行自动检测,该法能有效地检测目前日益流行的变形病毒。2.提出了基于模糊模式识别的病毒动态检测新思路。检测系统用定义在特征域上的模糊集来描述正常程序和病毒程序,然后采用“择近原则”进行模式分类。通过使用模糊智能学习技术,系统检测准确率达到91.93%。3.提出了基于支持向量机的病毒动态检测方法。注意到正常程序的API调用序列具有局部连续性的性质,受此启发探讨了以API函数调用短序为特征空间的病毒自动检测方法。将支持向量机应用到病毒检测中,可以保证在先验知识不足的情况下,仍然有较好的分类正确率,这在较难获得大量病毒样本的情况下十分有利。实验表明基于支持向量机的病毒动态检测模型能有效地将正常和异常程序区分开来,只需要较少的病毒样本数据做训练,就能得到较高的检测精确率。由于检测过程中提取的是程序的行为信息,故而可以有效地检测采用了加密、迷惑化和动态库加载技术的病毒。4.在借鉴传统特征扫描技术的基础上,提出了病毒静态分析检测方法。检测系统以程序中静态抽取的n-gram信息为特征,根据特征的信息增益值进行特征选择,应用粗糙集理论对所抽取的特征进行约简,消除冗余特征。检测系统通过统计方法找出正常程序与病毒程序的差异性,病毒检测过程中不需人工事先提取病毒的特征码。重点研究了基于核的属性约简方法,优化后的约简算法时间开销远小于经典属性约简算法。5.深入研究了集成神经网络作为模式识别器在病毒静态检测中的应用。在Bagging算法的基础上,提出了IG-Bagging集成方法。IG-Bagging方法将基于信息增益的特征选择技术引入集成神经网络的构建中,同时扰动训练数据和扰动输入属性,使得生成的个体网络差异度大。实验结果表明,IG-Bagging的泛化能力比Bagging方法强,与Attribute Bagging方法相当,但其效率远优于AttributeBagging方法。6.提出了基于D-S证据理论的病毒动态检测与静态检测相融合的新方法。检测系统采用支持向量机作为成员分类器对病毒的动态行为建模,使用概率神经网络作为成员分类器对病毒的静态行为建模,最后将各成员分类器的检测结果用D-S证据理论融合。应用D-S证据理论进行信息融合的一个最重要的环节就是证据信度值的确定。注意到相对某分类器,在对实际问题建模时都要尽力扩大类之间的距离,其类可分性强,则其分类结果越好,据此提出了基于类间距离测度的证据信度分配新方法。一般情形下Dempster组合规则的复杂度为P-complete,在本文的研究环境下,证明可以得到一种计算时间代价为O(N)的计算方法,说明提出的病毒检测方法符合高性能需求。通过应用D-S证据理论组合异构分类器,提高了集成病毒检测器的准确率,实验测试和结果分析表明该方法对未知和变形病毒均具有良好的检测效果,且性能优于流行的商用反病毒工具软件。

【Abstract】 Computer viruses have been one of the most serious threats to information security due to the significant damage and the fast spread of them. As virus become more complex and sophisticated, the classical scanning detection method is no longer able to detect various forms of virus code effectively. It is crucial to develop new methods for defending viruses. In this dissertation, we explore the intelligent methods of automatically detecting viruses based on statistical learning theory. The main contributions of the dissertation are summarized as following:Firstly, a multi-na(?)ve Bayes algorithm to detect computer viruses automatically is presented. This model monitors programs in the virtual machine to learn their behavior. As program interacts with operating system at runtime, the most relevant API calls are extracted as feature vector in detection engine. After being trained, the multi-na(?)ve Bayes classifier could be used to check malicious file. It is an efficient method for detecting the polymorphic viruses.Secondly, using the method based on fuzzy pattern recognition algorithm, an intelligent system to detect the computer viruses is proposed. In this method the program files could be expressed as fuzzy sets. Then the principle of fuzzy closeness optimization to classification of samples is applied. Experimental results show that the method could detect known and unknown viruses by analyzing their behavior. The accuracy of the detection method is 91.93%Thirdly, a method based on support vector machine (SVM) is proposed for detecting the computer virus. By utilizing SVM, the generalizing ability of virus detection system is still good even the sample dataset size is small. An experiment using system API function call trace is given to illustrate the performance of this model. It is found that the detection system based on SVM needs less priori knowledge than other methods and can shorten the training time under the same detection performance condition. The encrypted virus, the obfuscated virus and the dynamic load library virus can be detected by analyzing the behavior information of the programs.Fourthly, motivated by the standard signature-based technique for detecting viruses, we explore the idea of automatically detecting viruses by use of the n-gram analysis. The original sample data is preprocessed with the knowledge reduction algorithm of rough set theory, and the redundant features are eliminated from the working sample dataset to reduce space dimension of sample data. The detection system categorizes a program as either normal or abnormal by the statistical method. It has no use for extracting the characteristic code of viruses before detection. An efficient implementation to calculate relative core, based on positive region definition is presented. Fifthly, we generalize the problem of neural network ensemble by use of the modified bagging method to detect previously unknown viruses. After selecting features based on information gain, the probabilistic neural network is used in the process of building and testing the proposed ensemble system. Experimental results produced by the proposed detection engine show the improvement of the generalization compared to the classical bagging method. And the approach yields great efficiency compared to the attribute bagging method.Last, we present a virus detection system based on the D-S theory of evidence, in which the dynamic and static analysis methods are combined. The detection engine applies two types of classifier, support vector machine and probabilistic neural network to detect the virus. For SVM classifier, we extract the feature vector by monitoring the samples. And the static feature of samples is used in the probabilistic neural network classifier. Finally, the D-S theory of evidence is used to combine the contribution of each individual classifier to give the final decision.The approach of the belief estimation is the key of D-S theory. We propose a new method based on statistical measure of the individual classifier. In a general way, the main aim of a classifier is to enlarge the inter class distances, however no matter what the theory behind it is. That is say the more a classifier is able to discriminate between the classes, the better the classification results is. Based on this observation, we use inter class distances as an evidence of our belief.As we know the complexity of Dempster’s combination rule is P-complete. But in the domain of virus detection, we prove that its time complexity is O(N) in the restricted situation. This shows the presented method is efficient for the detection of viruses. Comparison experiments on polymorphic viruses show that the performance of our method is better than that of the commercial-grade antivirus tools.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络