节点文献

代码迷惑及其语义研究

Research on Code Obfuscation and Its Semantics

【作者】 高鹰

【导师】 陈意云;

【作者基本信息】 中国科学技术大学 , 计算机软件与理论, 2007, 博士

【摘要】 移动代码是计算机网络和Internet技术中广泛应用的一种计算模式,而且也被应用在许多新的技术中,如移动代理、主动网络等。但是,移动代码也带来了单机环境中不存在的安全威胁。这种威胁来自两个方面:一方面,当移动代码被下载到主机,恶意主机将可能会逆向工程这些代码,从而窃取代码中包含的关键算法或数据。在一些特殊的场合,如在移动代理中,代表计算主体完成计算的移动代理程序及其数据状态对于执行这些程序的主机是开放的。如果该主机是恶意主机,它就有可能对移动代理程序及其数据状态进行篡改,以达到攻击其它主机的目的。另一方面,网络的发展使得我们可以越来越方便地下载并执行一段代码,但是如果这段代码是具有恶意行为的,就有可能会破坏主机系统的资源,威胁到主机机密信息的安全。这两方面问题就称为是恶意主机问题和恶意客户问题。代码迷惑是为解决恶意主机问题提出的一种自动程序变换技术,在恶意主机问题和恶意客户问题两个方面,代码迷惑扮演着不同的角色,这分别带来了需要研究的问题:在恶意主机问题中,代码迷惑是一种有效的软件保护机制,但是,代码迷惑以前的研究主要集中在如何构造代码迷惑算法,而对于构造的代码迷惑算法是否有效,这些研究都没有提供严格证明,代码迷惑算法的构造缺乏有效性证明的理论支持。在恶意客户问题中,代码迷惑被广泛地用来构造恶意软件变体,这给现存的恶意软件判定技术带来了新的问题。采用代码迷惑构造的多态病毒、变体病毒,使得现在已有的恶意软件判定器对主机的保护效果大大降低。本文研究的思路是建立代码迷惑与语义理论的联系,从而可以利用已有的形式语义方面的研究解决代码迷惑研究中现存的问题。针对上述代码迷惑应用于移动代码安全时分别存在和导致的问题,本文研究分为两个方面:(1)在恶意主机问题中,代码迷惑是一种保护客户代码的重要的安全性技术,但是目前缺乏代码迷惑作为安全性方法的理论基础。本文研究了从语义角度为代码迷惑建立统一的有效性比较方法:◇建立代码迷惑有效性比较框架,包括两个部分:形式化代码迷惑空间,形式地定义代码迷惑有效性度量。在该框架中,采用语义信息的变化来刻画有效性,允许定义更加细化的有效性度量,也能将传统的基于语法的有效性度量纳入这个比较系统。◇压平算法是一种重要的迷惑算法,本文给出了压平算法的形式描述,并基于代码迷惑有效性比较框架证明了其有效性。我们发现基于偏序的代码迷惑有效性比较框架在表达力方面具有一定的局限性,无法为各种代码迷惑算法都建立有效性比较。◇本文研究了概率化抽象解释框架,通过建立程序不确定性的量化比较关系,从而达到量化比较代码迷惑有效性的目的。针对目前概率化抽象解释只能适用于有限维状态空间的问题,提出使用紧空间来刻画无限维状态空间的方法,扩展了概率化抽象解释的应用范围。这样,将状态空间提升到可度量的矢量空间上,就可以通过在矢量空间中定义距离来作为迷惑算法的有效性度量。(2)在恶意客户问题中,代码迷惑技术是一种重要的攻击手段,为构造恶意软件变体提供了一种有效的方法,这使得传统病毒发现技术受到了挑战。本文从语义角度研究了恶意软件的变体判定技术:◇本文建立了一种新的基于语义的恶意软件判定框架(MOM)。代码迷惑是一种保语义的程序变换技术,可以使用这种程序之间的语义相关性来作为识别一个程序是否是某恶意程序变体的标准。对于属于同一变体的恶意程序,MOM无需更新病毒库就能实现对恶意程序的识别。◇改进并设计了基于验证的恶意软件判定方法。该方法将恶意软件判定问题转化为证明验证条件正确性问题,并且通过输入迷惑算法规范来指导验证条件的产生和证明。◇MOM的验证条件产生是采用符号执行技术,因此如何高效地求解符号状态也是需要研究的问题。本文提出了一种不同于传统符号执行的方式,针对传统符号执行中状态替换时效率低下的问题,以静态单赋值形式作为中间表示,以基于支配树作为程序符号状态的收集策略,从而实现了高效地收集符号状态。◇本文基于编译框架LLVM,实现了一个MOM的原型系统。实验过程说明了基于语义恶意软件判定框架的有效性。

【Abstract】 Mobile code is a kind of computing model that is widely used not only in computer network and Internet technique, but also in many new techniques, such as mobile agent and Ad hoc network. However, Mobile code poses threats to security, which had never happened in the stand-alone environment. The threats come from two aspects. In one aspect, when mobile code is downloaded on a host, a malicious host will reverse engineer the code to extract secret data and algorithm from it. In some special situations, such as in mobile agent, the mobile agent program and its data state which work as the computing entity to finish the computation are open to the host. But a malicious host might tamper the agents for the purpose of attacking the mobile code. In the other aspect, the widespread of Internet make it more convenient and easier to download and execute pieces of code. But if the code is malicious, it might ruin resource and security of the host. The two aspects are called as malicious host problem and malicious code problem.Code obfuscation is a technique of automatic program transformation for solving the malicious host problem. The different roles played by code obfuscation in these two aspects incur respective problems to be dealt with: In mobile host problem, code obfuscation is a useful method to guarantee the security in mobile agent and protect the program from reverse engineering. At present, the interests on code obfuscation mainly focus on the construction of code obfuscation algorithm. None of these algorithms has provided the efficiency proof. In mobile code problem, code obfuscation is a widely-used tool to construct malware variations, which is a hard problem to the research of malware detection. The metaphor and polymorphic virus constructed by code obfuscation have greatly reduced the protecting capability of current malware detectors.In this dissertation, the relationship is established between the code obfuscation and semantics. Thus, the existing formal semantics theory enlightens the problem which code obfuscation brings up. According to the two problems, our research consists of two parts:(1) As for malicious host problem, code obfuscation is an important method to protect client code, but the lack of a theoretical background is a major drawback of current research about code obfuscation. This dissertation proposes a general method for measuring the efficiency based on semantic foundation:The semantic-based comparable framework measuring obfuscation efficiency includes two parts: formalize the space of code obfuscation and formally define the metrics measuring obfuscation efficiency. In the framework, measuring obfuscation efficiency is based on semantic information, more refined metrics can be defined for measuring obfuscation efficiency and the syntax-based metrics also can be involved into it.The flattening algorithm is a very important code obfuscation method. This dissertation formally defines the flatting algorithm and proves its efficiency in the comparable framework.We find that the weak expressiveness is the limitation of the partial order based comparable framework. The comparable framework can’t compare all obfuscation algorithm based on partial order metrics.This dissertation studies the Probabilistic Abstract Interpretation (PAI) and establishes the method to measure program uncertainty. The method can be used for quantitatively measuring obfuscation efficiency. This dissertation also presents the Compact Operator space to extend the PAI in finite-dimension state space to infinite-dimension state space. All the obfuscation algorithms can lift the same concrete domain to a measurable vector space. Thus, the distance defined in the vector space can be used as the metrics to measure obfuscation efficiency.(2) As for malicious code problem, code obfuscation is an important attack tool for effective construction of malware variations, which brings a big challenge to current technique of malware detection. This dissertation studies the semantics-based malware detection.It presents a new semantics-based malware detector (MOM). Code obfuscation is a program transformation satisfying semantics-equivalence. The semantic relationship is the key to detecting whether a program is variation of a malware. The variations of one malware can be detected by the MOM without updating the virus database.It designs a malware detection method based on program verification. The method reduces the problem of malware detection into an equivalent problem of judging the validity of the verification conditions (VC). The specification of obfuscation algorithm can be input to guide the generation and proof of VC.The verification conditions generation of MOM is on the basis of the symbolic execution. So it’s a problem to efficiently collect the symbolic state. Different from the traditional symbolic execution, the method proposed in this dissertation tries to solve the low efficiency in state substitution problem, by utilizing the Static Single Assignment (SSA) as the intermediate representation and the dominator tree based symbolic state as collection strategy, and realizes high efficient collection of symbolic state.It provides a MOM prototype which implemented on the compiling framework LLVM. The experiment shows that the MOM is effective.

  • 【分类号】TP393.01
  • 【被引频次】10
  • 【下载频次】335
节点文献中: 

本文链接的文献网络图示:

本文的引文网络