节点文献

多重循环程序内存访问越界增量检测方法

Incremental Detection Method for Memory Access Overrun in Multi-loop Programs

【作者】 王嘉捷

【导师】 蒋凡; 杨寿保;

【作者基本信息】 中国科学技术大学 , 信息安全, 2009, 博士

【摘要】 在严重威胁软件系统可靠性和安全性的多类软件缺陷中,内存访问越界属于危害性很强且广泛存在的一类缺陷,可被黑客利用并转变成多种拒绝服务漏洞和著名的缓冲区溢出等高危险性的安全漏洞。现有的内存访问越界缺陷检测方法,对现实程序中内含复杂条件分支的多重循环,无法有效分析其内部频繁的指针运算和内存操作,因此检测效果不佳。特别是,循环执行次数无法预知而由外部输入动态确定、内存操作受循环内条件分支影响而动态变化等常见情况,对当前热门的符号执行类精确检测方法构成巨大挑战,导致该类方法在遍历程序路径实施检测时,会遇到严重的路径组合爆炸问题,即使消耗非常多时间,也常常无法正常终止。另外,现有方法被设计成对完整的程序代码实施全面检测,但现实软件系统常常会由于各种原因被更新和升级,若每次软件更新后都进行全面检测,代价过高。而且,现有方法也没有针对软件更新时容易引入新缺陷的代码区域进行专门检测,因此大部分检测时间被花费在无意义的重复检测之中。经分析,其问题根源是:现有符号执行类方法单纯追求代码覆盖率而缺陷针对性不强。因此,提出了一种面向多重循环程序的内存访问越界缺陷增量检测方法。针对内存访问越界缺陷与循环归纳变量运算紧密关联的特点,首次将递推链扩展代数与符号执行类缺陷检测方法相结合,设计了面向多重循环的缺陷检测指导信息,同时利用软件更新影响的局部性特点,实现了基于路径指导的定向检测和基于更新影响的增量检测。其主要算法如下:1)根据软件更新对源代码的文本修改,结合控制与数据依赖分析,识别受软件更新实际影响的语义变化区域。有针对性地在该区域中分析多重循环中指针运算和内存读写模式,初步识别出内存访问越界疑似缺陷,再根据控制与数据依赖关系标记出疑似缺陷所依赖的语句和变量,作为缺陷依赖区域,构造能充分检测软件更新引入缺陷的相关执行要素最小集合——精简检测流图,及早排除与缺陷检测无关的语句。2)基于精简检测流图进行多重循环分析,跟踪和分析多重循环中缺陷依赖变量的递推关系,将其统一表示为递推链扩展代数的形式,再根据该代数规则进行符号操作,从而推导出缺陷依赖变量的值或上下界的闭形式函数,构建循环摘要和函数摘要。接着,根据内存范围等安全性限制条件,利用循环摘要和函数摘要,构造缺陷触发条件并求解,从而判断缺陷触发的可能性,及时排除疑似缺陷集合中的误报,还可推断出用于预测和选取缺陷关联路径的缺陷检测指导信息。3)根据路径选取方向性、条件分支检测优先级、各个循环的有效迭代次数范围等缺陷检测指导信息,有指导地以按需符号执行方式实施路径敏感和位运算级精度的缺陷定向检测。每次经过疑似缺陷点时,主动检测缺陷触发条件,并结合在检测路径上收集的路径分支条件进行约束求解,通过判断触发路径的条件可满足性来进一步避免缺陷误报。而每次在路径分叉点,按需克隆执行环境以避免相同路径前缀的重复执行,并立即求解所选取分支的路径条件集,以及时剪除不可行的路径分支。最后,生成无误报的缺陷集合、能真实触发这些缺陷的程序输入集合以及相应的触发路径集合,作为缺陷并非误报的验证信息。基于上述算法,设计和实现了能面向多重循环程序实施内存访问越界缺陷增量检测的原型系统,并在国内首次将微软主流编译器的下一代工业级编译平台Phoenix用于软件缺陷检测,作为该系统的基础支撑环境。该系统已检测Filezilla Server、SpamAssassin、WGet和OpenSSL等多个知名开源软件,找到了真实的代码缺陷。实验结果表明,利用递推链扩展代数进行多重循环分析,有指导地以符号执行方式实施定向检测,能够克服多重循环程序对现有符号执行类方法造成的困难,包括循环次数由输入确定、内存操作受循环内条件分支影响等难题,既避免了盲目路径遍历,又保持了路径敏感和位运算级的检测精度,提高了检测效率和准确性。同时,利用软件更新影响的局部性特点,实现缺陷增量检测,提高了检测时效性和针对性。另外,该方法能够在检测和定位内存访问越界缺陷同时,生成相应的程序输入和触发路径等验证信息,这在软件安全性测试和信息对抗等领域有很大的应用价值。

【Abstract】 Memory overrun belongs to a kind of dangerous and universal defects among several kinds of defect which are able to compromise safety and security of software system seriously.By it,Hackers can cause many types of denial-of-service vulnerabilities and very dangerous security vulnerabilities including infamous buffer overflow.Current detection methods for memory overrun are unable to efficiently analyze frequent pointer arithmetic and memory operations in multi-loops with complicated conditional branches of real programs,thus their detection effects are not good enough.Such usual cases as unpredictable loop iteration numbers determined dynamically by input and memory operations controlled dynamically by conditional branches in multi-loops etc.,pose great challenges against popular precise detection methods based on symbolic execution.Therefor,these methods will encounter serious problem of path combination explosion when traversing program paths for full detection,so that they are unable to terminate normally even if very much time is spent.In addition,the current methods are designed to detect full source code,but real software systems are often updated for many reasons,it is over-expensive if source code is fully detected after each software update.Moreover,the current methods don’t detect specially the code region where new defects caused by updates occur probably;therefore most time is spent in meaningless repeat detections.The root cause of these problems is analyzed and concluded to be that the current symbolic execution methods try best to achieve high code coverage without enough pertinence as to defects.Thereby,the incremental detection method for memory overrun in multi-loop programs is presented.According to the characteristic of tight correlation between memory overrun and loop induction variable arithmetic,extended algebra for chains of recurrences is combined with detection methods based on symbolic execution,for the first time,to design defect detection guide information as to multi-loop,while locality characteristic of software updates and their impact is used.Thus,directed detection based on path guide and incremental detection based on update impact is implemented,whose main algorithms are as follows: 1) According to textual modification in source code caused by software update, after control/data-dependence analysis,semantic difference regions are identified.By analyzing pointer arithmetic and memory operations in multi-loop pertinently,suspect defects of memory overrun are primarily identify.Based on control/data-dependence relation,instructions and variables dependent by suspect defects are marked as defect dependent regions,thus FACE(Flow grAph for Condensed Examining) is built,which is minimum set of correlative execution elements capable of detecting defects caused by updates adequately,to preclude instructions irrespective of defect detection as soon as possible.2) Based on FACE,multi-loops are analyzed by a) tracing recurrent relations of defect dependent variables in multi-loop,b) expressing them uniformly as CR#(extended algebra for chains of recurrences) form,c) operating them symbolically based on CR# algebra rule,d) deduce closed-form functions of value or range limit of defect dependent variables,e) construct loop summary and function summary.Next,by combining safety restriction condition e.g.memory range with loop summary and function summary, defect trigger conditions are built and solved to estimate probabilities of defect triggering for precluding false positive in suspect defect set,and infer defect detection guide information for predicting defect relative paths.3) According to guide information such as path direction,branch priority, available range of loop iteration etc.,path-sensitive and bit-precise directed detection is applied,based on on-demand symbolic execution.At each suspect defect point,the defect trigger condition is actively detected and solved together with path conditions to determine condition satisfiabilities and preclude more false positive.At each path branch point,the executive context is on-demand cloned to avoid re-executing the same path prefix of several paths,and path conditions of selected branch are solved in time to prune infeasible paths.Lastly,defect set and corresponding set of defect-triggering input and execution paths are automatically generated as verification information.Accordingly,the incremental detection prototype system for memory overrun in multi-loop programs is designed and implemented.For the first time in China, Phoenix,next-generation industry-level compiler infrastructure of Microsoft mainstream compiler,is used for software defect detection,as basic underlay platform of the system.The system has been tested on several open source software such as Filezilla Server,SpamAssassin,WGet and OpenSSL,and found real defects.The experiment results show that by CR# analysis in multi-loop and defect directed detection,the algorithm can overcome the multi-loop problems encountered by current symbolic execution methods,including number of loop iteration determined by input,memory operations controlled by conditional branches in loop etc..Thus,it avoids blind path traversal while achieving path-sensitive and bit-precise detection,and improves efficiency and accuracy of detection.Moreover, incremental detection based on locality characteristic of software updates and their impact is applied to improve detection pertinence.In addition,it is of important application value in the field of software security testing and information counterwork that the algorithm can automatically generate sets of true defect, defect-triggering input and execution paths

节点文献中: 

本文链接的文献网络图示:

本文的引文网络