节点文献

基于动态二进制翻译的龙芯虚拟机中数据预取优化研究

Data Prefetching Optimization Research in Dynamic Binary Translation Based Loongson Virtual Machine

【作者】 罗琼程

【导师】 吴强;

【作者基本信息】 湖南大学 , 计算机科学与技术, 2009, 硕士

【摘要】 二进制翻译作为实现代码移植的一种软件手段,能将某一体系结构下的可执行二进制程序在没有其源代码的情况下翻译转换成能在其它体系结构下运行的二进制代码。动态二进制翻译就是边翻译边执行,并在翻译的过程中进行动态优化。随着微处理器技术、编译技术的发展,二进制翻译逐渐成为研究的一个热点方向,在虚拟化技术、分布式计算及信息安全等方面得到了广泛重视。当前,微处理器频率不断提高,而内存频率的提升进展缓慢,其性能差距越来越大,对内存的访问早成为制约程序性能的瓶颈。作为访存优化的一种重要方法,数据预取可以将随后使用的数据提前读进高速缓存,这样能有效隐藏访存延迟,提高程序性能。本文在动态二进制翻译系统中对数据预取优化进行研究。首先结合龙芯处理器的硬件特性,采用软件插桩方式收集应用程序的访存指令其执行周期及步长变化信息来识别发生Cache缺失的延迟指令,并依此进行分类,接着对程序中的热代码构造数据预取优化单元——超级块(SuperBlock),在此基础上实现了SuperBlock基本数据预取方案。最后,通过对SuperBlock进行数据流分析得出的寄存器定值引用关系,提出了基于访存指令地址计算分量列表等不同预取优化策略。通过在龙芯3号虚拟机上实验验证,SuperBlock构造在开销小于1%的情况下能够提高翻译后SPEC2000整点测试程序的平均性能达10%。虽然数据预取对于整点测试集没有明显的优化效果,但可以对翻译后浮点测试程序达到3.3%的性能加速比,而其SuperBlock构造及预取分析开销远小于0.5%。

【Abstract】 As software means of code migration, binary translation can convert executable binaries in the absence of its sources from one instruction set architecture to the other. Dynamic binary translation is just-in-time translation, while at the same time, it can do dynamic optimization. With the progress of processor、compiler technology, binary translation has become a hot research direction and has got extensive attention in virtualization technology、distributed computing and information security, et al. Currently, the rate of improvement in microprocessor speed exceeds the rate of improvement in Dynamic Random Access Memory speed. Hence the increasing Processor-Memory Performance Gap is now the primary obstacle to improved computer system performance. Dynamic optimization is an important research subject in binary translation systems. As a way of memory optimization, data prefetching reads subsequent data into high speed caches ahead of time to hide memory access latency, which can improve application’s performance.In this thesis, data prefetching optimization is studied for the first time in a binary translation system. It combines Loongson processor’s hardware features and uses software instrumentation to collect program’s memory access latency information. Firstly, delinquent loads and their types are identified, data prefetching optimization unit—SuperBlocks are then constructed from frequently executed code blocks, after that, basic prefetching optimization is realized. Secondly, the data flow analysis is performed on SuperBlock to generate the RDUG(register define and use graph), and then some data prefetching schemes are proposed based on load instructions’address computing components.Experiments on the dynamic binary translation based Loongson-3 virtual machine show that, Superblock construction can achieve 10% improvement on average for translated SPEC2000 int programs while the overhead is less than 1%. Although data prefetching optimization has lettle effect on SPEC2000 int, it can improve the performance of SPEC2000 float programs by 3.3% on average with the analysis overhead far less than 0.5%.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2010年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络