节点文献

SIMD编译优化技术研究

The Research on SIMD Compilation Optimization

【作者】 王迪

【导师】 史册;

【作者基本信息】 浙江大学 , 信息与通信工程, 2008, 硕士

【摘要】 多媒体应用是近年来计算机领域的研究热点。多媒体应用的代码往往具有较高的并行度。为了获得更高的性能,几乎所有的处理器厂商都为其处理器增加了多媒体扩展,以充分利用处理器的计算能力,并提供了具有单指令多数据特点的指令集,简称为SIMD指令。SIMD指令能够对一组数据执行向量形式的运算,每组数据被划分为几个子字,并对所有的子字都执行相同的操作。SIMD指令能够带来较高的执行效果、较低的功耗和较好的资源利用率。充分利用SIMD指令可以有效提高应用程序的性能。但是目前编译器对SIMD指令的支持并没有达到足够令人满意的程度。程序员往往需要通过手写汇编代码、内嵌汇编、或者通过使用编译器认识的内部函数等手段在代码中显示的使用SIMD指令。这要求程序员对SIMD指令集有深入了解,提高了多媒体程序的开发难度。此外,由于不同处理器的SIMD指令集之间差异较大,从而使代码的可移植性降低。因此我们希望编译器能够自动的从高级语言生成SIMD指令。称为SIMD编译优化。这种优化和传统的针对向量处理器的自动向量化非常类似,因此又称为SIMD向量化。但体系结构和主要应用领域的不同使多媒体扩展和向量处理器之间存在较人差异。两者在向量长度、指令集特点以及存储操作等方面存在着关键区别。妨碍SIMD编译优化的主要有以下几个问题:多媒体程序中复杂的代码形式影响了向量化效果,各种多媒体典型操作变化较多不易识别,存储操作功能较弱制约了优化性能的提升。本文提出了一种面向由C语言编写的多媒体应用程序、与目标平台无关的SIMD向量化方法,是对原有的向量化方法的扩展与改进,解决了现有技术中含有指针形式的C程序难以向量化、多媒体典型操作无法有效识别、SIMD寄存器数据利用率不高等问题,能够有效生成SIMD指令。该方法主要在语法树中间表示上进行,不改变编译后端的结构,具有较好的可移植性和可扩展性。该方法对传统向量化算法的主要改进之处在于:对多媒体C代码广泛存在的指针访问进行了分析处理;采用模式匹配方法有效识别复杂的多媒体典型操作;采用循环分布、循环交换等方法提高了循环中潜在的并行性;本文还提出了一种SIMD寄存器分配方法减少冗余存储操作,降低因体系结构限制而产生的数据存储操作开销。

【Abstract】 The past decade has witnessed multimedia processing become one of the most important computing workloads. To respond to the ever-growing performance demand of multimedia workloads, multimedia extensions (MME) have been added to most existing processors. The multimedia extensions of most processors exploit single instruction multiple data (SIMD) instruction, which provides a form of vectorization where a large machine word is viewed as a vector of subwords and the same operation is performed on all subwords in parallel.SIMD instructions offer higher performance, lower energy and better resource utilization. Systematic usage of SIMD instructions can significantly improve program performance. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler known functions (CKFs), which will lead to poor readability, portability problem and high cost of software development and maintainance. In order to ultilize SIMD instruction fully, we need compiler to translate the high-level languages to SIMD instructions of media processors automatically. This is called SIMD compilation optimization.Because of the similarity between multimedia extensions and vector processors, one may naturally consider applying traditional vectorization techniques to multimedia applications. However, satisfactory result are yet to be obtained for the vectorization of realistic multimedia programs on MME. The gap between MME vectorization and traditional vectorization is the natural result of both the architectural differences multimedia extensions and traditional vector processors and the differences between multimedia applications and numerical applications. The obstacles of SIMD optimization lie in the following aspects: Complex code form in source code of multimedia applications, the multimedia typical operations and its variations which can not be recognized easily, and the constraints in memory operations.In this thesis, the author carry on a research to develop efficient SIMD optimization techniques, and present a SIMD compilation optimization approach that is integrated into LCC compiler. The approach can efficiently generate SIMD instructions, as well as to cope with the use of pointer in C code, and identify the typical oprations in the multimedia application. Becides, The author designed and implemented an approach of SIMD register allocation, improved the performance of register allocation effectively.

【关键词】 SIMD编译优化向量化寄存器
【Key words】 SIMDcompilation optimizationvectorizationregister
  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2009年 07期
  • 【分类号】TP314
  • 【被引频次】7
  • 【下载频次】246
节点文献中: 

本文链接的文献网络图示:

本文的引文网络