节点文献

多核结构上高效的线程级推测及事务执行模型研究

Research on the Efficient Thread-level Speculation and Transactional Execution Model on Multi-core Platform

【作者】 刘圆

【导师】 杨寿保; 安虹;

【作者基本信息】 中国科学技术大学 , 计算机系统结构, 2007, 博士

【摘要】 片上多核作为当今处理器设计的主流技术,需要运行多线程应用才能充分发挥性能。推测多线程方法能够简化并行编程,允许程序员或者编译器在不完全保证正确性的情况下,尝试激进的优化方式来开发和利用更多的程序并行性。实现这种方法的难点在于访存操作的局部缓存,已提出的一些推测多线程方案都使用了非常复杂的缓存机制,不光增加了硬件设计复杂度,也在一定程度上影响了应用开发的效率。实现这种技术的另一个难点是如何有效地减少误推测对并行性能的不确定性影响。为此,本文尝试采用事务存储和动态剖析技术来解决这两大难题,为多核平台寻找一种能够高效地推测并行化应用程序的软硬件协同的解决方案。本文围绕基于事务存储的线程级推测技术开展了深入系统的研究,涉及结构模型、编程和执行模型、动态优化方法等方面的内容。主要研究成果包括:(1)本文首先提出了一个基于事务存储的推测多线程体系结构模型SPoTM(Speculatire Parallelization on Transactional Memory)。SPoTM利用事务存储来实现线程间的读写操作隔离,提供了线程乱序执行、顺序提交、冲突检测以及推测失败后回退等功能。(2)本文还为SPoTM结构设计了一个基于循环并行的推测多线程编程模型,提供了实现该编程模型所需的推测线程系统库以及指令集扩展等。SPoTM编程模型实现简单,并行化需要的代码调整很少,对多线程并行程序设计的简化非常明显。(3)本文选取SPEC CPU 2000中的若干典型程序,在为SPoTM结构开发的模拟执行平台fastTM和sim-SPoTM上进行了详细的评测,量化分析了各种硬件机制对推测执行性能的影响,以寻找性价比较好的实现方案。本文还全面分析了在推测执行条件下Cache局部性的变化,并提出和验证了几个改善局部性的方法。(4)针对当前推测多线程优化中普遍使用的离线剖析方式受到培训输入集限制的问题,本文提出并实现了一种在运行时根据在线剖析结果自动变换推测多线程程序的动态优化方法。该方法在运行时执行剖析和优化工作,不需要单独的剖析过程以及通用的测试输入集,同时也适用于那些运行时行为特征呈阶段性变化的程序。实验表明,在指导事务划分和选择并行循环方面,动态优化方法能够达到和离线优化方法相近的效果。在设计评测SPoTM结构模型,开发动态软件优化系统的过程中,我们得到了一些关于如何有效利用推测多线程技术的定性结论。首先,为了提升推测执行性能,我们认为更多的努力应当投入到软件优化方面,而不是激进地调整硬件结构和执行机制。其次,推测多线程技术并不能使自动并行完全取代手工并行,这种技术可以作为手工并行的辅助工具来使用。最后,不论是手工并行还是自动并行,一个渐进的并行代码变换过程都是需要的,而在此过程中,剖析指导的优化技术起着非常关键的作用。

【Abstract】 Multi-core architecture has become the main stream of processor designs, but to make full use of the parallel computing resources on the multi-core platforms the multi-threads application is desired. The speculative parallel threading technique has been proposed in order to simplify the parallel programming. Its distinct characteristic is to relax the constraints about sequential semantics between threads, which allowed programmers or compiler to attempt the aggressive optimizing ways even though the validity of transformations can’t been guaranteed in the static compiling phase.It is an issue to buffer memory accesses during implementing speculative multithreading. Current speculative multithreading projects used too complicated buffering mechanism, which increased the complexity of hardware designs and impacted the efficiency of multithreaded application developments. The other problem is how to reduce the indetermination of parallel performance gains from mis-speculation. So this dissertation uses transactional memory and dynamic profiling technique to address the two problems. And the research target is to find an efficient software-hardware associative solution to speculative multithreading for multi-core platforms.This dissertation focuses on the implementation of the speculative technique based on transactional memory, which covers architecture model, programming model, threaded execution model, and dynamic optimizing methods. The detailed work includes the following aspects: First, a speculative multithreading architecture based on transactional memory, named as SPoTM (Speculative Parallelization on Transactional Memory), has been proposed. SPoTM isolates the load/store operations contained in different threads through the transactional, and support out-of-order execution, in-order commitment, violation detection and recovery from speculation failure. Second, a simple programming model which targets the loop parallelization has been designed, and the speculative system library and the ISA extension go along with it. It needs very few modifications to parallel sequential programs using this programming model, so this model significantly simplifies the parallel programming. Third, we have developed two simulation tools for the verification and experiments, one of which is fastTM performing the function-level simulation, the other of which is sim-SPoTM, supporting cycle-precious simulation. To evaluate the effect of various software and hardware factors on the speculative execution performance, we attempt some design choice and use several applications in SPEC CPU 2000 benchmark as test cases running on the SPoTM simulation platform. We also consider and analyze the change of Cache locality under the speculative multi-threading environment, and propose a few methods to improve the locality. Finally, an online profile guided dynamic optimization framework has been proposed on the SPoTM platform as the core component of the continuous gradual profile guided software parallel optimizing system for speculative execution. The offline profile way can’t guide effectively and accurately the optimization of the program without a representative training input, but in most cases there aren’t such training inputs. We attempt to adopt the online profile to extend the usage of profile in speculative optimization. The evaluation shows that the ability of this approach is comparable to the traditional offline implementation on two aspects: identifying the loops suitable to be speculatively parallelized; and performing transactional partition optimization. So we believe that this approach is able to serve as an individual guide to speculatively parallelize the applications when traditional offline profile is unavailable due to the lack of general training inputs.We have drawn some conclusions of the speculative parallel threading technique itself during the process of the implementation and evaluation to the SPoTM architecture. First, we think that more efforts should be devoted to the software optimization, not complicated hardware design, because we have found that even the very aggressive hardware mechanism achieved only limited performance gains. Second, it is impractical to improve the performance of most applications through automatic parallelization using speculation, and this speculative multi-threading technique can be regarded as an assistant tool for the sophisticated manual parallelization. Finally, profile technique plays a key role in gradual speculative multi-threading optimization, whether it is the offline way or the online way, but in the future the latter is more and more important because of the requirement of dynamic optimization.

  • 【分类号】TP332
  • 【被引频次】5
  • 【下载频次】272
节点文献中: 

本文链接的文献网络图示:

本文的引文网络