节点文献

恶意代码检测与分类技术研究

Research into the Detection and Classification of Malware

【作者】 赵恒立

【导师】 郑宁;

【作者基本信息】 杭州电子科技大学 , 计算机应用技术, 2009, 硕士

【摘要】 恶意代码的爆炸式增长以及其变形多态技术应用使得传统的基于特征码的检测方式不能满足安全新要求。本文从反病毒实际需求出发,提出了一种恶意代码自动化检测与归类方法。通过恶意代码综合分析系统(AMIAS)提取出静态和动态行为组合特征,然后使用支持向量机建立两类分类器对样本进行检测。同时,生成恶意代码行为分析报告,并通过解析已知病毒库中恶意代码行为分析报告,提取出病毒家族行为模式,然后使用支持向量机建立恶意代码多类分类模型。本文提出的恶意代码检测方法克服了单一静态或动态检测的不足,能够实现海量样本的快速检测,分类方法根据恶意代码的行为将样本划分到所属恶意代码家族,能够为后续恶意代码清除工作提供指导。本文对以下的四个方面进行了研究。第一,提出了一个用于恶意代码检测的动态与静态组合特征定义方法。通过学习恶意代码静态和动态行为信息,定义一个55维恶意代码组合检测特征向量,其中包含的20维静态特征通过分析恶意代码和正常代码的PE结构信息差异获得。动态行为分析法具有识别未知恶意代码的能力,在恶意代码Win32 API调用信息大量研究的基础上定义了35维动态行为特征向量,特征向量的每一维表示一种动态行为事件,这些行为事件都是通过相应的Win32 API函数及其参数调用信息归纳得出的。第二,本文基于虚拟机控制技术实现了一个恶意代码自动化综合分析系统(AMIAS)。AMIAS系统主要实现两个功能,一是提取出与组合特征定义中特征项对应的特征值。二是对每一个分析样本生成相应的行为分析报告,AMIAS系统属于自动化的联机处理系统,能够解决反恶意代码工作对海量恶意代码分析的需求。第三,本文提出了一种新的基于支持向量机模型的恶意代码检测方法,在组合特征定义的基础上,建立支持向量机两类分类器用于恶意代码检测。检测实验数据集包含9917个恶意代码和6591个正常代码。初始实验中根据数据集的不同来源,建立不同训练集用于训练支持向量机分类器。根据初始实验分类误差数据中有效特征数统计结果,通过设定有效特征数阈值对初始实验进行改进,改进实验结果表明当阈值为6时,检测效果和样本利用率都较高。同时本文设计了对比实验,验证组合特征定义法与支持向量机模型联合用于恶意代码检测的有效性,对比结果表明在误报率小幅提升的情况下检测率得到了较大提高。对于灰色样本数据检测误差,本文引入特征属性重要性量化方法,通过对特征属性值的加权处理,有效降低了灰色样本的检测误差。第四,本文对基于行为的恶意代码分类方法进行改进,通过恶意代码行为分析报告的分类间接实现恶意代码的分类。基于恶意代码行为信息单元的定义对行为分析报告进行特征词提取并对提取出的特征词进行聚类预处理,然后定义映射函数将行为分析报告映射成特征词向量空间数据,最后训练支持向量机多类分类器实现恶意代码自动化分类,实验表明基于行为信息单元的特征提取方法能有效提高恶意代码自动化分类的准确率和效率。

【Abstract】 With the explosive growth of malware which often use polymorphism and metamorphism technology,the traditional signature-based detection methods could not meet the security requirements.From the perspective of actual anti-virus requirements,this paper proposes an automated malicious code detection and classification methods. The automated malware integrated analysis system(AMIAS) can extract static and dynamic behavior features, then use support vector machine to detect malware. AMIAS system also generate the malware behavior analysis report.We learned the behavior patterns from each malware family in the known malware database and establish a multi-class classifier with SVM for the classification of new detected malicious samples. Our method overcomes the shortcoming of single static or dynamic detection method and could achieve rapid detection of massive malware samples. Malware classification result could provide guidance for the remove of malware.The main contents of this paper focus on four aspects: first, we proposed a definition of static and dynamic behavior feature. By learning known malware static and dynamic behavior information, we defined a 55-dimensional combination feature.Static feature includes a total of 20 features,these static features are extracted from the PE file structure differences between the benign and malicious code.Dynamic behavior analysis has the ability to detect unknown malicious code, therefore behavior features is the main body of the union feature. Based on the extensive research on the Win32 API using of malware,we defined a total of 35 behavior features. Each feature represents a kind of dynamic behavior event, these event all derived from the summarized information with corresponding Win32 API function calls and their parameters.Second, we implement the automation of malicious code integrated analysis system (AMIAS) with the virtual machine control technology. AMIAS system has two functions, one is extracts the value of feature which is correspondingly defined in feature space. The other is to generate an behaviour analysis report of each sample. AMIAS is an automated on-line processing system, which will address the massive malware analysis requirements.Third, we proposed a new malware detection method based on SVM. With the definition of combination feature,we construct SVM model for malware detection. Detection experiment dataset contains 9917 malware and 6591 benign code. According to the different data sets source, we design an initial experiment and create different training set for the training of SVM classifier. With the mathematical statistics of effective feature numbers of error samples in the initial experiment. We improved the initial experiment and the results show that when the threshold number is 6, the ratio of detected and sample utilization are both high. We also designed comparative experiments to verify the effect of joint use with combination feature and SVM model. The results show that joint use detection method perform better. For the gray samples, we have improved the model with the introduction of feature importance quantitative methods, we generate new feature value with product of feature weights and value. Experiments show that improved detection performance better on the gray samples.Fourth, We improved the malware behaviour report classification method and accomplished malware classification task through the report classification indirectly. Based on the malware behaviour unit, we extracted feature words from behaviour report, then define mapping function to map behavior analytical report into vector spatial data, finally train a multi-class SVM classifiers for automatic classification of malware. Comparison with similar methods,experimental results show that our method can effectively improve the accuracy and efficiency of malware classification.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络