节点文献

自修改代码逆向分析方法研究

Research of Reverse Analysis on Self-Modifying Code

【作者】 王祥根

【导师】 冯登国;

【作者基本信息】 中国科学技术大学 , 信息安全, 2009, 博士

【摘要】 恶意代码(Malicious code,Malware)已成为互联网安全的主要威胁。随着计算机的普及和互联网的发展,恶意代码造成的危害也越来越严重。为了提高对恶意代码造成的网络攻击的应急响应速度,我们必须对恶意代码做出快速有效的分析。恶意代码分析的目的是提取恶意代码执行时表现出的行为(runtimebehavior)、解析其意图及其实现机理,为恶意代码的检测和清除提供参考。而恶意代码作者为了对恶意代码进行保护,加大恶意代码分析的难度,往往通过加密、变形和加壳等多种技术手段隐藏自身代码特征,阻碍恶意代码机理分析和特征提取,躲避恶意代码检测。传统分析方法难以有效解决针对受保护的恶意代码的分析问题。针对恶意代码分析与软件安全测评等业务的需要,本文重点分析了典型恶意代码软件保护关键技术以及恶意代码分析技术的最新进展,深入研究了典型自修改代码(Self-Modifying Code,SMC)的实现机理,在此基础上,针对典型SMC提出了一种基于硬件模拟器的逆向分析方法,取得以下几个方面的研究成果:(1)深入分析了典型SMC的实现机理,并初步建立了SMC模型。我们根据动态生成代码的生成方式、修改模式以及存储模式等因素,首次对SMC作了初步的分类和建模,为后续的SMC分析方法奠定了基础。(2)提出了一种基于硬件模拟器的可执行文件中动态生成代码的识别与提取方法。本方法通过在模拟器中单步执行目标可执行文件,并通过截获虚拟系统执行指令,使用影子内存监控程序执行过程中的内存写操作以及控制转移指令等信息,识别提取程序执行过程中动态释放到内存中并得到执行的代码,获取分析目标的数据信息。由于在硬件模拟器中对可执行文件进行动态分析,数据采集是通过模拟硬件实现,而不是将恶意代码放在真实的CPU上执行,因此对实际系统不造成任何影响。(3)提出了一种基于硬件模拟器的动态链接库中动态生成代码的识别与提取方法。本方法在模拟器中使用动态链接库加载程序引导加载动态链接库,设置单步执行标志,仅使目标动态链接库文件中的指令在模拟器中单步执行,通过触发动态链接库中入口点等函数的执行,并通过截获虚拟系统执行指令,使用影子内存监控动态链接库中代码执行过程中的内存写操作以及控制转移指令等信息,识别提取其执行过程中动态释放到内存中并得到执行的代码,获取分析目标的数据信息。(4)提出一种基于二进制文件重构的自修改代码分析方法。我们针对不同类型的SMC提出了相应的重构方法,即在不改变其代码行为的前提下将提取的动态生成代码恢复到原二进制文件中,生成完整的、可直接静态分析或运行的二进制文件。以此为基础,分析人员可利用传统分析方法对其进一步分析,提高了针对SMC的逆向分析能力。本方法原理简单,易实现,且具有较好的通用性,不仅适用于可执行文件,而且适用于动态链接库。(5)提出了一种基于代码覆盖的多路径分析方法。基于代码覆盖的多路径分析方法重点解决了对循环代码的处理问题,通过标识判断条件节点,减少局部路径被重复遍历的次数,在保证分析效果的同时,提高分析系统的分析效率以及代码覆盖率。(6)设计实现了一套SMC逆向分析原型系统,完成了SMC分析的相关实验,对文中基于二进制文件重构的自修改代码逆向分析方法的有效性,分析效率以及性能等方面进行了评估。

【Abstract】 Malware(Malicious Code or Malicious Software) creeps into users’ computers, collecting users’ private information,wrecking havoc on the Interact,has become the centerpiece of most security threats on the Internet.With the popularity of computer and the development of the Internet,the damage caused by malware is also more and more serious.To enhance the emergency response speed of network attacks that malware actualized,we must analyze malware rapidly and effectively.Malware analysis is an essential technology that extracts the runtime behavior of malware,and supplies signatures to detection systems and provides evidence for recovery and cleanup.To hinder malware analysis and make the analysis more difficult,malware writers usually have their programs heavy-armored with various anti-reverse engineering techniques.Such techniques include code encryption,metamorphism and binary code packing.Unfortunately,existing techniques for detecting malware and analyzing unknown code samples are insufficient and have significant shortcomings. Existing solutions are either unable to handle novel malware samples,or vulnerable to various evasion techniques.To meet the needs of malware analysis and security evaluation,in this paper,we analyze the key anti-reverse engineering techniques and related work of malware analysis,and the implementation mechanism of typical Self-Modifying Code(SMC) thoroughly.Then based on above analysis works and results,we propose a reverse engineering approach for typical SMC based on emulator,which is motivated by the intuition how to combine static analysis and dynamic analysis effectively.Mainly has done the following several aspect works:First,the implementation mechanism of typical SMC is analyzed thoroughly,and a primary model of SMC is proposed.We model and classify typical SMC according to generation mode,modification mode and storage mode of dynamically generated code.The research and its application of the mechanism provide theoretical foundation and guideline for the study of the reverse engineering techniques against typical SMC.Second,a fully dynamic approach for extracting the original hidden code (dynamically generated code) and additional information useful for further analysis of packed executable binaries is presented.In this paper,we present a binary extraction technique which is fully dynamic and thus does not depend on the program disassembly or the known signatures of packing techniques.We also show that our proposed technique can extract the original hidden code and data.In addition to extracting the hidden code,our proposed method can provide additional information on the packed executable binaries.It can identify the exact regions of memory where the hidden code and data reside.By tracking the newly-written memory areas of the program,we can distinguish newly-generated code and data at run-time from the packed executable binary,and thus obtain the exact regions of them.Third,a fully dynamic approach for identifying and extracting dynamically generated code and additional information useful for further analysis of packed DLLs (dynamic linker libraries) is presented.In this paper,we propose a technique to extract the hidden code by loading a DLL and triggering and monitoring the execution of the entry-point function and exported functions of packed DLLs.By monitoring all memory operations and control transfer instructions,our approach extracts the original hidden code which is written into the memory at run-time.Fourth,a technique for reconstructing a SMC binary for static analysis is proposed.Our proposed technique constructs a binary based on the original SMC binary,the hidden codes extracted and the records of control transfers,by patching the hidden code extracted on the packed binary and restores the control transfers to generate a binary for static analysis.Our proposed technique modifies the original binary to generate equivalent static code without altering its origin program behavior. The reconstructed binaries can be successfully analyzed by static analysis tools,such as IDA Pro.Fifth,a system by exploring multiple execution paths for malware analysis based on code coverage is proposed.Our proposed method reduces the times of some paths explored and improves the analysis efficiency and increases the coverage of malware by way of labeling control flow decision points(branching points).Sixth,an automated framework for extracting hidden code and reconstructing SMC binaries is designed and implemented.Applying our above proposed technique, we build a framework for automatically examining SMC binaries,extracting their original hidden code and reconstructing a binary based on the extracted code and additional information.Based on the prototype,we have successfully done a series of experiments on the analysis of typical SMC binary.We also present the evaluation results of the framework,demonstrating that it is applicable to analyze typical SMC binary.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络