节点文献

基于区域的编译技术和栈寄存器优化

【作者】 刘旸

【导师】 张兆庆;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机系统结构, 2003, 博士

【摘要】 为了提高指令级并行,编译器必须进行大量的优化。而复杂的编译优化算法需要耗费大量的编译时间和资源。为了减少编译的时间,并且给研究者提供一个灵活的研究平台,同时保证编译性能不受损害,本文提出了一种灵活的区域构造框架。在研究过程中,我们发现RSE(Register Stack Engine)开销在某些程序的运行过程中占有很大的比例,针对这个问题,提出了一种能够有效减少RSE开销提高代码性能的方法。文章的主要贡献有如下几点:1提出了一个新的区域构造框架,作为编译器各优化阶段的统一编译单位。这个区域构造框架是分层次的,其形状和大小可以根据优化阶段的需要灵活控制。控制流图中的所有基本块都至少被一个区域所包含。区域之间是以树状的关系组织在一起。2提出了一个可以参数化设定最大尾复制比率和最小出口概率限制的单入口多出口区域构造算法。实验表明该算法构造的区域对于适应不同的优化阶段以提供足够多的优化机会是非常有效的。3提出了区域属性的概念。区域的属性可以在编译器不同的优化阶段被设定,观察,维护和清除。这些属性对于维护区域前后优化阶段的效果是非常重要的。4提出了一个RSE开销的代价模型;基于这个代价模型,提出了过程间栈寄存器配额分配问题。5基于前述的量化代价模型,给出了一个过程间栈寄存器分配算法。该算法可以有效地减少总的内存访问开销,从而提高程序的执行性能。6提出了一种可重计值活跃区间的优化寄存器分配方法。对于某些可重计值的活跃区间,由于它的定义点的总执行频率大于它的引用点的总执行频率,则它没有被分配到寄存器时,溢出处理会从这些定义点移动到引用点的基本块里去执行,减少了实际执行时的代码执行时间。7上述算法均在ORC编译器中实现,并利用Spec2000Int程序对提出的算法进行了测试,对比了基于区域的编译和基于函数的编译时不同的优化效果。实验结果表明,区域的构造大大地减少了程序的编译时间,同时提高了程序的执行性能。过程间的寄存器分配算法对于RSE开销比较大的程序,性能的提高非常显著。而对于一些RSE开销不是很明显的程序,性能并没有明显的变化,或者略微有所提高。对可重计值活跃区间的优化则对Perlbmk的性能有明显提高。总体而言,程序的执行性能有了比较显著的提高。最后,对以上算法的扩展进行了讨论,提出了可以在未来继续深入研究的问题。随着处理器执行频率的加快,存储优化问题必将越来越受到研究者的重视,

【Abstract】 In order to improve instruction level parallelism,compilers tend to adopt more aggressive and complex optimization algorithms. But too complex and aggressive optimization algorithm will cost much compilation time and resources. In order to reduce compilation time and provide researchers a flexible research platform, on the meanwhile, prove the compilation performance,we proposed a flexible region formation infrastructure. In order to solve this problem that RSE cost is very serious for some programs, we proposed a effective algorithm to reduce RSE cost and improve overall execution performance.The main contributions of thesis are:1 We propose a new region formation infrastructure. This region formation infrastructure is hierarchical. Size and shape of regions could be controlled flexibly by setting specific parameters. Every basic block in the control flow is contained by at least one region. Regions are organized in a tree.2 We propose a Single Entry Multiple Exits region formation algorithm with max tail duplicate ratio and min exit probability constraints. Experiments show this algorithm is very effective in forming regions which could provide enough optimization opportunities for different optimization phases.3 We propose the attributes of region. Region attributes could be set,observed,maintained and deleted across different optimization phases. These attributes are very important for guarantee of the optimization effects.4 We propose a RSE cost model, and based on this cost model, we proposed the inter-procedural stacked register quota assignment problem.5 We propose a inter-procedural stacked register allocation algorithm,and it is implemented in ORC compiler.This algorithm could reduce the overall spill-to-memory access time so as to improve the program execution performance. This algorithm is based on the cost model proposed by us.6 We propose a method to decide whether allocating a stacked register to rematerializable live ranges. For a rematerializable live ranges, if the total execution frequency of the basic blocks where it is defined is greater then the total execution frequency of the basic blocks where it is used, then we need not allocate a register to it because spill could move those define to the basic blocks where it is used,therefore reduce the execution frequency of loads.

节点文献中: