节点文献

基于并行处理单元的代码优化方法研究

Study of Code Optimizing Method Based on Parallel Functional Units

【作者】 邱春武

【导师】 余文;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2008, 硕士

【摘要】 与传统DSP相比,现代DSP采用更多的ILP技术以提高机器性能。本文讨论的DSP采用分簇的VLIW体系结构,能够在单个时钟周期同时执行多个操作。本文先讨论这款DSP代码优化器的构造方法,之后对TI TMS320DM642给出了代码优化器的具体实现。VLIW DSP代码优化器在LCC编译器框架基础上实现。首先用LCC作为编译前端得到中间代码,然后对中间代码进行模版注释得到目标机器指令相对应的程序,最后对其进行簇分配和调度,同时分配寄存器和功能单元,得到优化的并行汇编代码。我们为VLIW DSP定制它的机器规格说明和机器描述,书写代码生成规则的iburg规范文本,并由iburg规范自动生成代码优化器中的指令选择部分。这样提高了VLIW DSP的代码优化器的可重定目标性。VLIW DSP体系结构的一个显著特点是分簇,与这一特点相对应,代码生成的一个重要步骤是簇分配,即为每个操作及其操作数映射合适的簇。簇分配应使得各簇的功能单元得到充分利用,并设法减少簇之间的数据传递。本文讨论了簇分配的常用算法和LIST调度算法,最后给出统一的簇分配与调度算法(UAS)针对VLIW DSP的实现。该算法的特点是簇分配与调度一同进行,当调度一个操作时,同时为这个操作和它的操作数分配合适的簇。实验证明本文给出的代码优化方法对于常用的DSP算法具有较好的优化效果。

【Abstract】 Compared with traditional DSP, modern DSP use more ILP technologies to improve its performance. The DSP we discuss in this thesis uses a clustered VLIW architecture and can perform multiple operations simultaneously during a single clock cycle .we discuss the construction of the code optimizer of VLIW DSP and present the implementation of TI TMS320DM642 especially.VLIW DSP code optimizer is implemented based on LCC compiler framework. First, we get the intermediate code from LCC frontend. And then we select instruction of target machine by template matching for intermediate code. Finally, we get assemble code that can be parallel processing. by cluster assigning、instruction scheduling and register & functional units assigning simultaneously.We customized a machine specification and a machine description for VLIW DSP. We write iburg specification which contains code generating rules, and iburg reads the specification and generates the instruction selection code. It improves the VLIW DSP code optimizer’s retargetability.One prominent features of our DSP’ architecture is clustering. With this feature, an important phase of our code optimization is cluster assigning, which maps operations and their operands to appropriate clusters. Cluster assignment should make maximal use of functional units across clusters, and reduce inter-cluster data movement besides. We discuss traditional cluster assignment algorithm and LIST instruction-scheduling algorithm, and implement the Unified Assign Schedule (UAS) algorithm to support cluster assignment, which has the following features: cluster assigning and scheduling are unified, and when scheduling an operation, the operation and its operands are assigned to their appropriate clusters at the same time. Experiments show that the code optimizer in this thesis is very effective in optimization of classical DSP Algorithm

  • 【分类号】TP332
  • 【被引频次】4
  • 【下载频次】132
节点文献中: 

本文链接的文献网络图示:

本文的引文网络