节点文献

面向多级SPM存储的并行程序优化

Multi-level SPM Based Parallel Programme Optimization

【作者】 任小广

【导师】 唐玉华;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 硕士

【副题名】任务与数据的关联调度

【摘要】 以Cache为代表的硬件管理存储层次作为解决“存储墙”问题的重要手段,在出现之初便获得了巨大成功,但随着技术和工艺的发展,逐渐暴露出诸多问题和缺陷,具体表现在性能、功耗、片上面积等方面。针对这些问题和缺陷,人们提出了软件管理存储层次方案,并进行了广泛研究。便笺存储器(SPM)作为一种软件管理片上存储器,在性能、功耗、片上面积等方面都比硬件管理的Cache有着更多的优势。具有多级SPM存储层次的片上多核体系结构对于片上多核系统的性能提升有着重要意义。本文针对多级SPM存储层次下片上多核体系结构中的并行程序优化方法进行了研究,提出了三种启发式任务与数据关联调度算法。首先是局部贪婪LGA关联调度算法,该算法在对任务进行表调度的基础上,对数据采用局部贪婪策略进行分配。针对LGA算法中局部贪婪策略带来的缺陷,本文又提出了全局预测GPA关联调度算法。同时,为开发循环迭代间的并行性,本文基于循环依赖重组技术,提出了旋转调度分配RPSA关联调度算法。针对上述三种算法,本文进行了编译实现,为面向多级SPM的并行程序优化提供了自动处理平台。同时,体系结构软件模拟技术能够为多级SPM存储层次的片上多核体系结构研究提供重要的模拟平台,也是本文任务与数据关联调度算法研究的必要验证手段。为此,本文基于SimpleScalar模拟器开发了多级SPM存储层次的片上多核系统模拟平台Sim-SPM。在Sim-SPM模拟器的设计和实现过程中,成功解决了多核环境、虚拟SPM空间、软件存储管理指令扩展等诸多问题。对现有指令集的兼容性设计使得Sim-SPM模拟器更具实用性。针对模拟器的模拟速度和模拟精度问题,本文进行了详细测试验证,实验结果表明Sim-SPM模拟器能够满足多级SPM存储层次下片上多核系统的模拟验证需求。最后,基于Sim-SPM模拟平台,选取多个典型测试程序对本文提出的三个算法从多个角度进行了详细测试和验证,包括初步性能对比、与近似最优方案比较、目标体系结构对算法性能影响等。实验结果表明,本文提出的一系列任务与数据关联调度算法对于多级SPM片上多核系统下的并行程序优化研究是具有指导意义的。

【Abstract】 The hardware-managed memory hierarchy, which is represented by cache memory hierarchy, has gain great success as a solution to the "Memory Wall" problem. However, with the development of technics and technology, lots of limitations and defects of the problem emerge gradually, specifically in the aspects of performance, power consumption, chip area and so on. To solve these problems, the software-managed memory hierarchy has been proposed, which has attracted many researchers’attention. Scratchpad Memory (SPM), as a software-managed on-chip memory, shows more advantages than hardware management based cache in performance, power consumption, chip area and other aspects.Multi-level scratchpad memory hierarchy has great significance for the performance of on-chip multi-processor. The paper mostly focuses on the study of parallel optimization of the on-chip multi-processor system with multi-level scratchpad memory hierarchy, and proposed three algorithms with heuristic policy for associated scheduling of task and data. The first algorithm is a local greedy scheduling algorithm (LGA). The algorithm gets the data partition based on the list scheduling of tasks with local greedy policy. To remedy the defects caused by the local greedy policy in LGA, a second global prediction scheduling algorithm (GPA) was introduced. The paper also researched a rotation scheduling algorithm for the development of inter-iteration parallelism, which is based on the dependent retiming technology. The author implemented the algorithm in a compiler, which automatically carries out the parallel optimization method.Meanwhile, the software architectural simulation technology is important to the research of multi-level scratchpad memory hierarchy, which serves as a means of validation of the algorithms proposed in this paper. The author developed Sim-SPM, a simulator for the research of multi-processor on chip with multi-level scratchpad memory hierarchy, which is based on the SimpleScalar simulator. During the design and development process, the author successfully resolved several obstacles of implementing the Sim-SPM simulator, such as the multi-processor environment simulation, extensions of memory management instructions and the virtual memory space simulation of scratchpad memory. The compatibility of the existing instruction set in Sim-SPM makes the simulator quite practicable. To attain simulation speed and accuracy of the simulator, the author carried out a detailed evaluation test and validated that the Sim-SPM simulator can meet very well the verification needs of the on-chip multi-processor with multi-level scratchpad memory hierarchy.Finally, the author tested the three algorithms proposed in this paper with several typical benchmarks on the Sim-SPM simulator. Several aspects had been verified through the experiment, including a preliminary performance comparison, comparion with the approximate optimal solution, the influence of target architecture on algorithm performance and so on. The experimental results show that the proposed algorithms for task scheduling and data partition have a guiding significance for the software management of multi-scratchpad memory hierarchy on the multi-processor system on-chip.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络