节点文献

Java虚拟机的自适应动态优化

The Adaptive Optimizations in Java Virtual Machine

【作者】 邹琼

【导师】 胡伟武;

【作者基本信息】 中国科学技术大学 , 计算机系统结构, 2008, 博士

【摘要】 Java语言以其在软件工程上的优势而被广泛地应用在各个领域的软件开发中。Java程序运行在Java虚拟机这一动态环境下,和传统的静态编译的二进制代码相比,它存在很多优势:代码的可移植性、安全性、自动化的内存管理和线程管理技术、动态类加载等等。这些方便而又强大的功能大大提高程序员的工作效率,因此被广泛使用。但是,这些动态的特性使得一些传统的静态编译技术不再适用,因此科学家们一直在探索新的编译技术,使得在虚拟机上能够获得更好的性能。由于缺乏运行时信息,静态编译采用较为复杂的全局分析而并不能得到理想的结果。Java虚拟机的介入使得编译及优化发生在程序运行时,因此工业界一直致力于发展自适应优化技术,希望能够利用程序运行时的动态信息来指导对程序进行何种优化。围绕Java程序中现有局部性的问题及其对应用程序性能的影响,本文系统深入地研究了Java虚拟机中的自适应优化技术,其主要的创新点及贡献如下:第一,设计并实现了一种低开销的自适应动态优化框架。该框架通过插桩来收集细粒度的信息,在程序运行的过程中,我们会根据反馈的信息自适应地调整插桩以降低开销,同时为了进一步减少插桩带来的影响,我们从Java程序的特性出发,尽量减少插桩的数目。和以前的静态分析工作相比,我们的工作是在运行时进行的,摆脱了因数据集变化而带来的不灵活性,和已有的动态分析工作相比,我们首次在Java虚拟机中实现自适应动态优化框架,弥补了Java虚拟机中现有的动态编译技术的不足,同时为了降低框架的运行时开销,我们针对Java语言的特性对框架进行一系列优化,包括框架设计、访问对象的插桩设计等,这些技术有效地降低了开销,进一步提高了Java程序的性能。最终的实验结果表明自适应优化框架的开销最多为2.5%,平均为1.7%。该框架为后面提出的局部性优化创造了良好的条件。第二,提出一种快速的滑动标记缩并算法。它在标记阶段记录位图和存活块池,在缩并阶段计算块内偏移表,将对堆的遍历转化为对块内偏移表的访问,大大地降低遍历堆所带来的开销;同时活块池的引入使得该算法很容易被应用在并行垃圾收集算法中。实验证明该算法使得标准工业测试程序SpecJBB2005、SpecJVM98和Dacapo的性能有不同程度的提高,最高达到8.9%;同时程序的局部性也优于线性标记缩并算法,与深度遍历序相比,DTLB(Data Translation Lookaside Buffer)失效率改善最多为11%,2级Cache失效率改善最多为13.6%。第三,基于自适应动态优化框架提出预取优化算法来改善程序的局部性。该算法基于自适应动态优化框架,它在即时编译器对程序编译的同时完成插桩的工作,插桩用来收集访存对象的信息。如果检测到当前运行过程中存在相关对象的访问,预取控制器将会插入相应的预取指令。自适应预取优化算法的关键在于预取准确性和运行时开销之间的权衡。为了保证预取的准确性,我们对程序进行插桩;为了降低运行时的开销,我们控制预取指令的插入并且实现无效的插桩删除优化。实验结果表明该算法使得标准工业测试程序SpecJBB2005、SpecJVM98和Dacapo的性能有不同程度的提高,最高达到18.1%,平均为7.15%。同时,运行时开销低于4%,内存开销可以忽略不计。第四,描述了一种基于对象亲缘关系的垃圾收集算法。该算法通过硬件性能分析器来定位频繁引起Cache失效的对象,根据对象之间的亲缘关系,建立对象亲缘图,并与垃圾收集算法相结合,将亲缘度高的对象们排列在堆中相邻的位置,这意味着访问完其中一个对象,接下来访问另外一个对象的概率很高,将它们放在一起可以改善对象之间的局部性,实验结果表明基于对象亲缘关系的垃圾收集算法对SpecJBB2005、SpecJVM98和Dacapo的性能有明显的提高,最多为4.9%,平均为3.4%,同时采用硬件性能分析器收集信息使得profiling的开销很低,平均为0.47%,最后我们将该算法和自适应预取优化相结合,结果表明大部分程序的性能不会降低,对于个别程序,甚至有所提高。

【Abstract】 Java language is widely used in software design for its merits in software engineer.Java applications run on the Java virtual machine.Compared with binary code generated by traditional compilation,it has features of better modularity, platform independence,type safety and so on.These features make Java language more suitable for fast and safe development of many large scale softwares. However,those characters cause traditional compilation unable to work. Researchers keep exploring new compilation techniques to get better performance on Java virtual machine.For the short of runtime information,static compilation adapts complex global analysis,which can’t satisfy our requirements.The popularity of Java virtual machine involves compilation and optimization at runtime,industries focuses on adaptive optimizations,and they want to optimize the applications according to runtime feedback.This dissertation systematically and deeply investigates the adaptive optimization and locality problem in Java virtual machine.The contributions of this work are as follows:Firstly,we design and implement an efficient adaptive optimization framework. The framework collects fine grained information by instrumentation,which also will be adjusted according to the runtime feedback.We also utilize the characters of Java applications to reduce the side-effect of instrumentation.Compared with static analysis,our work is implemented at runtime and is independent of variable data set.Compared with dynamic analysis,we designed an adaptive optimization framework in Java virtual machine,and supplied a gap in dynamic compilation techniques.We try to make the overhead lowest throughout the framework:its design,instrumentation,and so on.The results show that the overhead of the framework is 1.7%on average,with highest of 2.5%.The framework provides a platform for locality optimization in subsequent chapters. Secondly,we suggested a fast slide mark compact algorithm.Allocation order is the best for locality,which slide mark compact algorithm is based on.But traditional design made the algorithm’s overhead too large.In this dissertation, we proposed a fast slide mark compact algorithm,which reduces the overhead by mark bit table,live block pool and offset table.The results show that it achieves up to 8.9%speedup in industry-standard benchmark SpecJBB2005,SpecJVM98 and Dacapo on the Pentium 4,11%improvement in dtlb miss numbers and 13.6% reduce with L2 cache miss numbers.Thirdly,a dynamic prefetch optimization is adopted based on the adaptive framework.We instruments the program in JIT compiler for load address profiling, detects the stride patterns periodically at runtime.When a stride pattern is discovered,we injects prefetch instruction and removes the instrumentation effect. The key points in the design are the tradeoffs between prefetching accuracy and runtime overhead.In order to reduce the runtime overhead,we developed techniques to remove the redundant instrumentations,control the prefetch instruction injections,and disable the useless instrumentation.Our pattern detection is light-weighted that we use a sliding window to filter the trace information for runtime analysis,and we use a stride frequency array that covers stride range between -64 and 64.Finally,the experimental evaluations show that the prefetch optimization can speedup SpecJBB2005.SpecJVM98 and DaCapo benchmarks up to 18.1%,with an average of 7.15%.At the same time,the maximal runtime overhead is less than 4%,and the memory overhead is negligible.Finally,we combined object affinity with garbage collection.Firstly,we use hardware performance analyzer to locate delequent objects,find out the affinity between them,and build an affinity graph.Garbage collection refers affinity graph,and the related objects would be colocated,such design can improve the location between delequent objects,because related objects are always accessed sequently.The experimental evaluations show the garbage collection based on object affinity can speedup SpecJBB2005.SpecJVM98 and Dacapo benchmarks up to 4.9%,with an average of 3.4%.At the same time,the overhead of hardware method is 0.47%.Finally,we implemented prefetch optimization on it,the result shows that the performance won’t be reduced,for some applications,the performance are better.

  • 【分类号】TP312.1
  • 【被引频次】8
  • 【下载频次】862
节点文献中: