节点文献

基于GPU的TOUGHREACT并行化实现

Parallel Implementation of TOUGHREACT Based on GPU

【作者】 朱彤

【导师】 魏晓辉; 张猛;

【作者基本信息】 吉林大学 , 网络与信息安全, 2012, 硕士

【摘要】 近年来,高性能并行计算技术发展迅速。利用新的多核、众核以及GPU计算平台高效实现复杂地质条件下物理化学状态数值模型的模拟,已经成为地质工作者越来越关心的科学课题。随着GPU通用计算的出现以及飞速发展,越来越多的研究人员利用GPU技术来加速地下多相流数值模拟软件的计算过程,以满足大尺度、高精度的应用需求。由劳伦斯伯克利实验室开发的TOUGHREACT是当前应用最广泛的解决地下多相流体运动与地球化学反应运移耦合过程和机理的模拟程序。当前,在对要求较大尺度、较高精度的复杂地质环境问题(如二氧化碳地质储存)进行数值模拟时,TOUGHREACT执行效率不高。因此通过GPU并行计算技术加速TOUGHREACT的数值模拟过程有非常重要的工程意义和研究价值。本文基于此目的在CPU-GPU异构计算平台上对TOUGHREACT软件进行了并行化实现。首先,通过了解相关专业知识,对软件的基本模拟过程进行简要理解。参考已有的研究工作,对软件的模块化结构进行了详细分析。对比多相流模块与地球化学反应运移模块在求解过程中的差异,综合考虑线性方程组的规模和每个时间步内迭代求解过程的并发性,确定多相流动数值模拟部分更适合在GPU平台上并行实现。在对自然科学和社会科学中许多实际问题进行数值求解时,经常使用偏微分方程作为数值模型来表示质量与能量守恒状态,而在对偏微分方程进行离散求解时,稀疏线性方程组的求解是主要的计算步骤之一。尤其是在对某些场地级大尺度问题进行模拟时,稀疏线性方程组的求解时间会达到80%以上。因此,本文对TOUREACT中各部分模块执行时间进行了对比,选择以其中线性方程组求解过程为重点开展并行化工作。由于求解多相流问题时遇到的系数矩阵具有非对称非正定的特征,因此本文使用krylov子空间法中的几种双共轭梯度法求解方程组。同时,为了不以牺牲求解效率为代价,决定不对预处理部分做GPU移植,而主要针对求解中最耗时的两个部分:稀疏矩阵向量乘(SPMV)和向量内积操作进行CUDA实现。确定了各个内核函数映射关系以后,基于CUDA的并行程序开发难度不大,但是一些必要的优化手段可以显著提高并行程序的性能。本文作了如下工作:选择合理的稀疏矩阵存储格式,减少内存占用以及主机与设备的数据传输开销;优化存储器访问,使用共享内存、页锁定存储器以及合并顺序执行的内核函数来减少全局内存访问;优化指令流,包括避免不必要的同步操作以及循环展开;实现多版本内核,建立线程规模判定树,根据不同的问题规模进行合理的线程组织,充分利用GPU上的处理器资源,以达到负载均衡的目的。最后,将实现的并行预处理共轭梯度求解器整合到TOUGHREACT程序中。在CPU-GPU构成的计算平台上,对不同规模的实际问题进行数值模拟,对本文实现的并行BICG和并行BICGSTB算法进行性能测试。实验表明,本文实现的线性方程组并行求解器相对于CPU串行程序有最多3.4倍的加速比,对多相流动数值模拟的整体求解过程有最多2.8倍的加速比。这一结果印证了本文使用的并行化策略的正确性,为进一步的对地球化学反应运移模块的GPU移植工作打下了很好的基础,积累了丰富的经验。

【Abstract】 With the rapid development of high-performance parallel computing technology, it ismore and more important for geologists to use the new multicore and GPU computingplatform to compute physical and chemical state under complicated geological conditions bynumerical simulation. General Purpose GPU computing, in combination with the undergroundmultiphase flow numerical simulation, may be effective tool for variety of hydrogeology andenvironmental geology problems. TOUGHREACT, developed by Lawrence BerkeleyLaboratory,is currently the most widely used simulation program of underground multiphasefluid motion and transport of chemical reaction process and mechanism of the Earth Simulator.At present, when involving large scale and high precision, and high complexity of large scalenumerical simulation problems (such as nuclear waste underground disposal, such asunderground storage of CO2), TOUGHREACT is ineffective. Therefore, using the GPUparallel computing technology to accumulate the numerical simulation of TOUGHREACT,has very important engineering significance and research value.For this purpose, this articleworks on numerical simulation software parallelization based on CPU-GPU.First of all, to briefly understand the basic simulation process of the program, I study theknowledge of the relevant expertise. Reference to the existing research work, carried out adetailed analysis of the modular structure of the software. According to the comparison ofmultiphase flow module and geochemical reactions migration module differences, I foundthat the numerical simulation of multiphase flow part are more suitable in parallel on a GPUplatform because of the solution process in terms of the size of linear equations and theconcurrency of the iterative process.When solving partial differential equations, sparse linear equations have a very importantrole. Often use partial differential equations as mathematical models. Then, based on thecomparison of its various parts of the module execution time, I decide to parallel the solver oflinear equations which account for more than80%of the simulation time.Since the coefficient matrix for solving multiphase flow problems encountered withnon-symmetric non-positive definite characteristics, the article uses several pairs of conjugategradient method for solving equations in the Krylov subspace methods. Analysis of thepreconditioned conjugate gradient method using in the solver, is done. In order not to come atthe expense of solving efficiency, I decided not to use GPU accelerating pretreatment section.CUDA implementation focused on the sparse matrix-vector multiplication (SPMV) and vectorinner which are two of the most time consuming parts of the solution. CUDA-based parallelprogram development is not difficult. The main task is how to optimize it. This paper made a lot of work on it. Including the selection of a sparse matrix storage format; reduce the hostand client data traffic; take a basic matrix-vector multiplication algorithm is divided into twomethods, so the computing will be more efficient; optimize the organizational structure ofkernel for reducing the core switching overhead; use shared memory and the page-lockedmemory to optimize memory access; design multiple versions of each operating kernel threadorganization; set up the thread scale tree, so that the size of the problem; make full use ofprocessor resources on the GPU. Availability of the program has increased greatly.Finally, the parallel preconditioned conjugate gradient solver package is integrated toTOUGHREACT programs. In order to test the performance, we use the parallel BICG andparallel BICGSTB algorithm to simulate the practical problems of different sizes on theCPU-GPU computing platform. The experiments show that it can speed up the solvingprocess of the linear equations3.4times in double precision. Through the parallelization ofthe mainly part of the program, overall simulation of the solution process also has been wellaccelerated and can speed up the total solving process of the simulation2.8times. This resultconfirms the correctness of the parallelization strategy used in this article and laid a goodfoundation for further parallelization of geochemical reactions migration module on GPU andhas accumulated rich experience.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2012年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络