节点文献

处理器微体系结构模拟加速策略研究

Research on the Acceleration Policy of Software Based Micro-architecture Simulation

【作者】 喻之斌

【导师】 金海;

【作者基本信息】 华中科技大学 , 计算机应用技术, 2008, 博士

【摘要】 在设计新的处理器时,体系结构设计人员需要从很大的设计空间寻求最优的设计方案。设计方案的优劣依赖于精确到时钟周期级别的微体系结构模拟器评估。然而,现有模拟器的速度一直是瓶颈,严重制约着体系结构设计师从更多的设计方案中寻求最优的设计,尤其是在多核乃至众核处理器的设计中,模拟器的速度瓶颈显得更加严重。现有的模拟加速方法主要对测试程序的部分动态指令段进行详细模拟并预热,虽然加快了模拟速度,但存在两个关键问题:(1)如何既简单又合理地选取部分动态指令段;(2)如何尽可能少地进行“预热”。因此,研究新的处理器微体系结构模拟加速策略十分必要。二阶段系统抽样模拟加速策略、康托尔模拟加速策略对合理选取部分动态指令段提供了有效支持;功能预热加速策略有效减少了“预热”长度。性能基准测试程序的程序行为并不是随机的,它呈现出周期性,不同程序的程序行为周期不同。传统的抽样模拟方法要么不考虑周期性,详细模拟许多冗余的指令执行,从而导致模拟速度的下降;要么试图严格考虑周期性,但缺乏捕捉周期性的有效手段。针对上述问题,二阶段系统抽样模拟将详细模拟指令的选择分为两个阶段。第一阶段将测试程序的动态指令流等分成长度较大的指令段,在这些段中以等间隔选取一定数目的指令段作为候选详细模拟指令段。第二阶段将第一阶段选出的每个段等分成长度更小的指令段,再以等间隔选取一定数目的指令段作为最终的详细模拟指令段。与传统的抽样模拟方法相比,该方法既可以减少冗余,又提供了有效考虑程序行为周期性的手段,可以通过设置参数演变成多种其他的抽样策略。TSSS模拟器是二阶段系统抽样模拟策略的原型,也可作为一种程序行为分析的工具。基于TSSS模拟器的实验表明,二阶段系统抽样模拟策略和目前最快的SMARTS策略相比可以获得15%的加速,而模拟精度相当。康托尔模拟加速策略以非常规方法解决指令选择问题。该策略将分形理论应用到微体系结构模拟中,使用三分康托尔集的构造过程进行详细模拟指令段的选择。性能基准测试程序的程序行为周期性表现为一种“群聚”性,而康托尔集也具有这一特性,使用康托尔集的构造过程可以近似模拟程序的行为特性。康托尔模拟加速策略利用“群聚”性建立CPI(Cycle Per Instruction)预测模型,利用该模型用户只要确定一个参数,即分割次数,就可以进行模拟,是一种简单明了的模拟策略。CantorSim模拟器是康托尔模拟加速策略的原型。基于CantorSim的实验表明,康托尔模拟加速策略比SMARTS策略的速度提高了23%,CPI平均相对误差为3.2%,仍具有较高精度。功能预热加速策略解决“预热”问题。在只有部分指令被详细模拟的技术中,由于详细模拟之前的功能模拟并不模拟微体系结构的状态,所以在开始详细模拟时必然会造成模拟失真的情况。基于等距抽样“功能预热”加速策略将抽样策略中功能预热指令段分成许多等长度的小指令段,然后以等间隔选取一些指令段进行功能预热,其它指令段以快速的功能模拟方式执行,在保证精度的情况下提高了整体模拟速度。该策略给出了其参数优化的经验模型,便于用户使用。实验表明,基于等距抽样的“功能预热”策略可以获得27.8%的模拟速度提升,同时模拟精度并未显著降低,有些指标甚至更为精确。随着多核处理器架构的出现,微体系结构模拟的速度面临着更大的挑战。如何全面地、正确地、快速地模拟多核处理器的性能还未得到根本解决。多核模拟加速策略提出了多核处理器的模拟方式,给出了完备多核模拟所需次数的理论模型,并提出了加速方法。

【Abstract】 Micro-architects explore a vast design space to identify the best processor designs. To evaluate the design alternatives, Micro-architects rely on the cycle level micro-architecture simulators. However, the simulation speed is a bottleneck in this design exploration because the simulation speed of existing simulators is extremely slow. What’s worse, the simulation of multi-core and many-core architecture is becoming more and more complicated and this leads to problems with slow simulation further exacerbated. Hence, it is urgently necessary to study on acceleration strategies of micro-architecture simulation.It is an effective way to speed up simulation rate by only detailed simulating partial dynamic instructions of a benchmark’s full dynamic instruction stream. However, there are two key challenges: (1) how to select instructions for detailed simulation, which is named as instruction selection problem; (2) how to warm up micro-architecture states before detailed simulation, which is named as warm up problem. These two problems are for the sake of a common goal which tries to accelerate simulation rate as much as possible under a given simulation accuracy.The Two-Stage Systematic Sampling simulation strategy is proposed to sovle the instruction selection problem. Traditional sampling methods either treat every dynamic instruction segment as the same causing to simulate redundant instructions in detail or try to capture the periodic behaviors strictly but difficult to achieve desire result. Two-Stage Systematic Sampling simulation approach selects the instructions for detail simulation in two steps: (1) Divide the full dynamic instruction stream of a benchmark into large segments which have the same length and apply systematic sampling to select many segments as the candidate detail simulation instructions. (2) Divide every selected segment in step 1 into smaller segments and apply the same sampling strategy as the step 1 to sample many segments as the final detail simulation instructions. This approach can reduce the redundancy caused by treating every instruction segment as the same and can also be evolved to several other sampling simulation approaches such as systematic sampling and stratified sampling simulation. The simulator using Two-Stage Sampling (TSSS) approach can also be used as a program behavior analysis tool. The experiments show that this approach can obtain 15% acceleration over the famous SMARTS approach while they have the same precision level.The Cantor simulation strategy employs an unconventional approach to sovle the instructon selection problem. It applies the fractal theory in micro-architecture simulation. It selects the instructions for detail simulation according to the construction procedure of trisection cantor set. Since the cantor set has the cluster property, it can be used to simulate the same cluster property of program behaviors. This thesis constructs a model for determining the number of divisions based on experiment observations. After this single parameter’s determination, users can simulate benchmarks simply. The Cantor simulation approach harvest 23% speedup over SMARTS approach but lose precision slightly. This approach can be used as a complement for other simulation methods.The acceleration strategy of functional warm up aims to resolve the warm up problem. Since the micro-architecture states are not recorded before the detail simulation, the simulation results may not reflect the real behavior of hardware. Systematic sampling function-warming is suggested by this thesis and it is based on the fact that the states of large hardware structures often have a long history. The experience model for determining sampling parameters is also provided in this thesis. Experimental results show that the proposed strategy can speed up simulation rate of 27.8% while the accuracy is reduced marginally.The multi-core and many-core architectures are becoming a new trend in micro-processor area, yet the simulation of them is more challengeable than that of single-core processors. The strategies studied in this thesis can be applied in simulation of multi-core architectures. Meanwhile, this thesis studied on the possible models of multi-core architecture and the simulation of them.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络