节点文献

深穿透粒子输运蒙特卡罗模拟的CPU/GPU协同算法研究

Research on CPU/GPU Synergetic Algorithm for Monte Carlo Deep Penetration Particle Transport

【作者】 杨博

【导师】 胡庆丰;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2011, 硕士

【摘要】 近些年,由于GPU在性能和可编程性方面都有很大提升,通用GPU计算以高性价比的优势越来越受人们关注。众多研究人员都将GPU应用于所属领域,GPU的应用领域已从早期的单一图形计算扩展到通用计算,尤其是科学计算领域。粒子输运模拟在国民经济建设和大规模科学工程计算中具有重要应用,粒子输运蒙特卡罗(Monte Carlo,简称MC)方法求解相对于确定性方法在求解某些复杂粒子输运问题时有显著的优势,但往往需要的计算量极大。CPU/GPU异构混合系统的出现为这一问题的解决带来了机遇和挑战。本文在现有粒子输运MC模拟算法的基础上,针对CPU/GPU混合异构体系结构的特点,提出了一种面向大规模异构混合系统的深穿透粒子输运MC模拟CPU/GPU协同算法,并实现了该算法与MCNP程序的整合。主要工作如下:1)提出一种基于GPU的MCNP伪随机数发生器,采用了与已有MCNP伪随机数发生器相同参数的线性同余法(LCG)来生成随机数,首先通过跳跃法快速为每个线程生成随机数种子,然后利用GPU多线程并行生成多个随机数子序列。相对运行在Intel X5670上的MCNP伪随机数发生器,本文提出的基于GPU的伪随机数发生器在NVIDIA M2050上获得了11倍加速比。2)提出一种基于GPU的深穿透粒子输运MC模拟的细粒度数据级并行算法,在MCNP中粒子输运MC模拟算法的基础上,针对GPU的计算和访存特点提出了一种基于粒子数的任务划分方法和高效并行数据结构及和归约方法。给出了几种消除分支和优化存储器的方法,有效的提高了算法在GPU上的性能。相比运行在X5670上的MCNP程序,整合了基于GPU的深穿透粒子输运MC模拟细粒度数据级并行算法的MCNP-GPU程序获得3.4的加速比。3)给出了一种针对CPU/GPU混合异构系统的深穿透粒子输运MC模拟CPU/GPU协同算法,在该算法中提出了一种异构节点内部CPU/GPU之间的启发式任务划分方法,在此基础上给出一种针对大规模异构系统的多级任务划分方法,及其与之适应的多级伪随机数发生器和层次归约算法。基于MPI计算环境和CUDA编程模型,将改进后的基于GPU的MCNP伪随机数发生器和深穿透粒子输运MC模拟CPU/GPU协同算法与MCNP整合为MCNP-Hybrid程序,在TianHe-1A的64个节点上对整合后的MCNP-Hybrid程序进行了测试,结果表明该算法具有良好性能和可扩展性。

【Abstract】 Over the last decade, the performance and programmability of GPU has been improved greatly. Due to general-purpose GPU computing with advantages such as cost-effective, it is paid more and more attention. Many researchers apply GPU in their field of study, so GPU have evolved from specialty hardware to massively parallel general computation devices. Simulation of particle transport plays an important role in national economical construction and large-scale computing in science and engineering. Monte Carlo (MC) simulation of particle transport owns great advantage over the determined methods to solve some complex types of particle transport, however, the computational complexity of MC method is very huge. CPU/GPU hybrid system has brought great many opportunities and challenges for the solution of that problem.On the basis of existing algorithm of MC particle transport, this thesis presents an algorithm based on large-scale hybrid system for MC deep penetration particle transport, which is designed to fit in with the peculiarity of the hybrid system and is well integrated with MCNP. The following is the main work:1) A GPU based MCNP pseudo random number generator is proposed, and in the generator LCG method is used with the same parameters of MCNP pseudo random number generator. First, the generator quickly generates the random number seeds of every thread through jump method, and then parallel generates several random number subsequences on GPU threads. Compared with MCNP pseudo random number generator on Intel X5670 6 cores CPU, the GPU based Pseudo random number generator proposed in this paper achieve 11 fold speedup on NVIDIA M2050 GPU.2) A GPU based algorithm for deep penetration particle transport MC simulation is proposed, and on the basis of MCNP algorithm for particle transport, a particle number based task decomposition method, high efficiency parallel data structure and reduction method are proposed in the algorithm. This thesis presents some methods to eliminate branch and optimize the usage of GPU memory, which effectively improve the performance of algorithm. Compared MCNP running on X5670, the MCNP-GPU which is MCNP integrated with GPU based algorithm for deep penetration particle transport MC simulation achieves up to 3.4-fold speedup on M2050.3) A hybrid system based CPU/GPU synergic algorithm for deep penetration particle transport MC simulation is proposed. In the algorithm a heuristic task decomposition method for CPU and GPU in a hybrid node is proposed, on the basis of which a multi-level task decomposition design to fit in with the peculiarity of the hybrid system is presented, then a multi-level pseudo random number generator and reduction method which are adapt to the multi-level task decomposition. Using MPI and CUDA, the multi-level pseudo random number generator and hybrid system based CPU/GPU synergic algorithm for deep penetration particle transport MC simulation can be integrated with MCNP to form MCNP-Hybrid, and the performance on subsystem of TianHe-1A proves that the synergic algorithm has good performance and scalability.

  • 【分类号】TP332;TP391.41
  • 【被引频次】1
  • 【下载频次】216
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络