节点文献
嵌入式处理器编译器关键技术研究
Research on Some Key Compiler Techniques for Embedded Processors
【作者】 吴圣宁;
【导师】 李思昆;
【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2007, 博士
【摘要】 嵌入式系统通常对性能、实时性、功耗等有着严格的要求,需要非常高效的机器代码。因此,嵌入式软件开发常采用汇编语言。但汇编语言编程费时、调试困难,而且代码难以移植。嵌入式系统的广泛应用和嵌入式软件规模的不断扩大,决定了用高级语言代替汇编语言进行嵌入式软件开发是必然趋势。但使用高级语言编译器生成代码的质量和手工编写的汇编代码相比还有很大差距,不能满足嵌入式系统的要求。因此,嵌入式处理器编译技术研究有着非常重要的现实意义。传统编译技术通常不能直接适用于嵌入式领域,需要进行扩展或者设计全新的算法。传统的编译技术一般基于规则的机器模型,并采用简单启发式方法以达到较快的编译速度。嵌入式处理器一般具有不规则的体系结构特征,且在嵌入式环境中,编译器生成高质量代码的重要性高于编译时间。本文主要研究了嵌入式处理器编译技术的三个问题:嵌入式处理器寄存器分配、多媒体扩展指令集的代码生成、双指令集处理器多目标代码选择。在传统图着色寄存器分配方法中,假设机器模型具有简单的寄存器文件。这些算法采用“degree<k”测试冲突图结点是否局部可着色,并通过特殊的技巧对此扩展,以适用于不规则的寄存器文件特征。经过多年研究,此问题取得了很大进展。但是,传统图着色寄存器分配算法在进行冲突图结点局部可着色测试时,无法知道邻居结点所分配的颜色,只能采取保守的估算,因而降低了编译生成的代码质量。这是它们难以克服的困难。传统树模式匹配和动态规划技术不能有效利用多媒体处理器的指令集并行,基于传统并行编译技术的扩展一般局限于循环级并行,且没有全局优化标量指令和SIMD指令的使用。传统编译技术一般仅优化单一的目标,例如性能、功耗。嵌入式系统常需同时优化多个目标。单一目标的优化对其它目标有何影响,如何在多个目标间权衡,需要进一步研究。嵌入式系统的复杂性决定了需要采用新的方法学来研究嵌入式编译技术。从元启发式算法以及编译器前后端有机结合这两个角度来研究嵌入式编译技术是本文工作的重要思想。近二十年来,元启发式(Meta-heuristics)算法得到广泛研究。这些算法来自于人们从自然界得到的启发,通过模拟物理过程、生物进化等自然现象,能够克服经典优化算法的局限性,系统地搜索问题的解空间,较好地解决多种优化问题。编译技术中同样存在各种优化问题,尤其在嵌入式环境中,这些问题更加复杂,常使得传统编译技术不能简单地适用。元启发式算法为求解复杂的编译优化问题提供了广阔空间。程序分析是编译优化的基础。可以把编译器前端程序分析的优势和编译器后端的机器相关信息结合起来,以适应复杂的嵌入式环境下编译技术研究的需要。本文主要研究成果和创新包括以下几个方面:1.针对嵌入式处理器不规则寄存器文件体系结构特征,本文提出了一种新的图着色寄存器分配混合演化算法HMA-GRA,突破了传统图着色寄存器分配算法在寄存器冲突图结点局部可着色测试问题上难以克服的障碍,为实现嵌入式处理器寄存器分配器提供了新途径。本文的HMA-GRA算法主要由遗传算法和禁忌搜索算法构成,利用Chaitin算法作为前端对冲突图进行预处理,对剩余冲突图中各结点按照其类型随机分配颜色,计算结点间的冲突数量,通过演化过程逐渐减少图的总体冲突,最终实现图成功着色。此算法不仅可以精确计算冲突图结点是否局部可着色,而且能够用规范的寄存器分配模型代替传统算法中的特殊技巧,以适应嵌入式处理器寄存器文件不规则特征带来的复杂性。2.面向多媒体指令集,本文提出了一种新的SIMD代码生成算法ICG-ME。ICG-ME算法扩展了传统的树模式匹配和动态规划技术,通过修改代码生成器的产生器iburg工具为数据流树结点的每个目标非终结符保留多个匹配规则,以识别构造SIMD指令的并行操作;在编译器前端对存储操作相对于向量寄存器的对齐状况进行数据流分析,并把分析结果传递给编译器后端,以辅助构造SIMD和数据重组指令;最终生成多媒体向量指令和非多媒体标量指令混合的汇编代码。通过结合编译器前端程序分析的优势和编译器后端的机器信息,ICG-ME算法不仅简化了SIMD代码的生成,而且能够同时利用SIMD指令和非SIMD指令,以生成高质量的机器代码。3.以ARM/Thumb双指令集处理器为例,本文提出了用元启发式算法求解编译技术中多目标优化问题的方法学。本文把双指令集处理器代码选择问题表示为权衡程序代码大小和运行时间的多目标优化问题,利用程序profiling技术为其建立数学模型,通过一种多目标蚁群算法结合子集选择的方法求解此问题。4.基于SUIF/MachineSUIF编译器研究框架,我们进一步完善了代码选择、寄存器分配和程序分析的实验环境。我们实现了多种传统的图着色寄存器分配算法和元启发式图着色寄存器分配算法HMA-GRA,实现了多媒体代码生成算法ICG-ME和部分程序分析模块,实现了针对多目标代码生成的蚁群优化算法MOARM-ANT-SS。嵌入式处理器编译技术需要扩展传统的编译技术或者设计全新的算法。本文分析了传统编译技术在嵌入式环境中遇到的关键问题,并以元启发式算法和编译器前后端有机结合的方为基础,研究了嵌入式编译技术的三个问题:嵌入式处理器寄存器分配、多媒体扩展指令集的代码生成、双指令集处理器多目标代码选择。实验结果表明了这些方法的有效性。本文把元启发式算法用于嵌入式编译技术的研究与实现尚处于初步阶段。元启发式算法从新的角度为研究嵌入式编译技术提供了广阔空间。
【Abstract】 To meet various requirements like performance, power, and real-time response to external events, embedded systems need highly efficient machine code. Thereby assembly language is always widely used in embedded programming. However, time-consuming programming, vexing debugging, and poor code portability of assembly programming are not neglectable. With the extension of application area of embedded systems and the ever-growing of the size of embedded software, assembly language will be inevitably replaced by high level language in embedded software development. But compared to manually written assembly code, the code generated by high level language compilers is often much worse and cannot meet the requirements. So it is very important to study the compiler techniques for embedded systems.Traditional compiler techniques are usually not applicable to the embedded domain directly, which need to be extended. Moreover, some completely distinct algorithms are expected sometimes. The reason is that traditional compiler techniques are usually based on regular machine models, and apply simple heuristics to get results rapidly; however, embedded processors often have irregular architectural characteristics. Additionally, in the embedded domain, the importance of high quality code outweighs that of short compilation time.This thesis mainly focuses on three problems of compiler techniques for embedded systems: register allocation for embedded processors, code generation for multimedia instruction set extension, and code selection for dual instruction set processors.Traditional graph-coloring register allocation algorithms are based on over-simplified register files of machine models, which apply the rule of "degree < k" to test the local colorability of interference graphs and extend this rule by ad hoc skills to irregular register file characteristics. Significant progress has been made on this problem through years of efforts of compiler researchers. But when testing the local colorability of interference graphs, traditional graph-coloring register allocation algorithms cannot get the colors of its neighbor nodes, consequently, they make conservative estimations of them.Since traditional tree pattern matching and dynamic programming cannot exploit efficiently instruction-level parallelism, the extension based on traditional parallel compiler techniques mainly is limited in loop-level parallelism, which cannot exploit scalar instructions and SIMD vector instructions optimally.Traditional compiler techniques usually improve code quality for a single objective, such as performance or power. However, embedded systems usually need to be optimized for multiple objectives. It is necessary to research how a compiler transformation for one objective influences the other objectives and how to make trade-offs among different objectives.The complexity of embedded systems makes it necessary to apply new methods to research compiler techniques for them. The core idea of this thesis is improving compiler techniques with the help of meta-heuristics and the integration of compiler front-end and back-end.In the last twenty years, there has been extensive research on meta-heuristic algorithms. Meta-heuristics, inspired by nature, can overcome the limitations of classic optimization algorithms and can search the solution space systematically to solve many optimization problems by simulating natural phenomena such as physical processes and evolutions. There are a variety of optimization problems in the compiler domain, which are more complex in the embedded domain. They bring difficulties to the application of standard compiler techniques. Meta-heuristics open up a broad way to solve complex compiler optimization problems.Program analysis is the foundation of compiler optimizations. To meet the requirements of compiler techniques for embedded systems, the advantages of compiler front-end program analysis and the machine-specific information of compiler back-end can be combined.The main contributions of the thesis are as follows.1. For irregular architectural characteristics of embedded processors, a new graph-coloring register allocation algorithm HMA-GRA is proposed, which is based on a hybrid evolutionary method. The algorithm provides a new direction for implementing register allocators while breaking through the difficulties in computing local colorability of interference graphs, which cannot be solved by traditional graph coloring register allocation algorithms based on simple heuristics.HMA-GRA algorithm combines a genetic algorithm with a tabu search subroutine. It uses Chaitin algorithm to pre-process the interference graph, assigns colors to nodes of the remaining interference graph randomly according to their classes, computes the number of conflicts, and decreases this number gradually until the graph can be colored successfully. This method can not only compute the local colorability of interference graphs accurately, but also replace the ad hoc skills of traditional methods by a regular register allocation model. It can thus deal with complex irregular characteristics of embedded processor register files.2. For multimedia instruction set, a new algorithm ICG-ME for the generation of SIMD code is proposed. ICG-ME algorithm extends standard tree pattern matching and dynamic programming techniques, modifies the iburg code generator generator in such a way that its generated code generators allow to retain multiple matching rules for each target nonterminal of a node of data flow trees in order to identify parallel operations of SIMD instructions. The data flow analysis that is to identify whether memory references are aligned relative to vector registers is performed in the compiler front-end. Then the analysis results are transfered to the compiler back-end to assist in constructing SIMD and permutation instructions. Finally, assembly code of multimedia vector instructions and non-multimedia scalar instructions is generated. Through the integration of compiler front-end analysis and the machine-specific information of compiler back-end, ICG-ME algorithm can not only simplify the generation of SIMD code, but also use both SIMD and non-SIMD instructions to improve code quality.3. In order to meet multi-objective requirements of embedded systems, a new methodology is proposed to deal with multi-objective optimization problems in the compiler domain using meta-heuristics. And the problem of code selection for ARM/Thumb processors with dual instruction sets is used to exemplify this idea.This thesis formulates the problem of code selection for processors with dual instruction sets as a multi-objective optimization problem that needs trade-off between the code size and the code execution time, creates a mathematical model for it using program profiling technique, and solves it with a basic multi-objective ant colony optimization algorithm MOARM-ANT-SS together with subset selection method.4. Based on the SUIF/MachineSUIF compiler research framework, we have extended the experimental environment for code selection, register allocation, and program analysis.Several traditional graph-coloring register allocation algorithms and a meta-heuristic graph-coloring register allocation algorithm HMA-GRA have been implemented. A code generation algorithm ICG-ME for multimedia processors and several compiler passes for program analysis have been implemented as well. Furthermore, an ACO algorithm MOARM-ANT-SS for multi-objective code generation is implemented.It is necessary to extend traditional compiler techniques or to design new ones in the embedded domain. This thesis analyzes some key problems of traditional compiler techniques in the embedded domain. Based on meta-heuristics and the integration of the compiler front-end and back-end, it mainly focuses on three problems of embedded systems: register allocation for embedded processors, code generation for multimedia instruction set extension, and code selection for dual instruction set processors. Experimental results prove the efficiency of our methods. The research and implementation in this thesis on applying meta-heuristics to compiler techniques for embedded systems are still elementary. From a new direction, meta-heuristics open up a broad way to research compiler techniques in the embedded domain.