节点文献

流场的格子Boltzmann模拟及其GPU-CUDA并行计算

Lattice Boltzmann Simulations of Flow Field and Its GPU-CUDA Massive Parallel Computing

【作者】 李承功

【导师】 康海贵;

【作者基本信息】 大连理工大学 , 港口、海岸及近海工程, 2013, 博士

【摘要】 格子Boltzmann方法(Lattice Boltzmann Method,LBM)是一种有效的模拟复杂流体流动的数值计算方法,已经成功的应用到研究多相流,多孔介质流和湍流等工作中。与传统的数值求解宏观方程(Euler方程和Navier-Stokes方程)方法相比,LBM是从介观动理论的角度,将流体抽象成大量的介观粒子,这些粒子在简单的网格上进行迁移和碰撞,通过对反映粒子分布的统计函数进行时空演化获得流体的宏观变量。因此,LBM具有边界条件易于处理、代码短小、编程简单与适合并行计算等优点。随着近年来科研工作者们对LBM越来越多的关注以及对LBM不断的发展和应用,LBM已经逐渐发展成为计算流体动力学中的一个重要研究热点。为了求解高雷诺数不可压流场,本文将Smagorinsky涡粘性模型拓展到LBM中的二维9速度(D2Q9)和三维19速度(D3Q19)多松弛时间(Multiple Relaxation Time, MRT)格子Boltzmann模型中,针对LBM数值模拟高雷诺数三维流场计算效率低的问题和其非常适合并行的优点以及基于中央处理器(Central Processor Unit,CPU)并行计算的限制,采用基于图形处理器(Graphic Processor Unit,GPU)的计算统一设备架构(Computing Unified Device Architecture,CUDA)并行编程模型对已建立模型进行并行加速,并利用建立的并行模型对方腔流和风生流进行了数值模拟研究。本文的主要工作如下:第一,介绍了LBM的发展历史,基本原理和模型,边界条件,单位转换以及数值实现过程。为了模拟高雷诺数的不可压流场,本文利用传递矩阵把粒子分布函数的二阶矩从速度空间传递到矩空间计算涡粘性系数的方法,将Smagorinsky涡粘性模型拓展到D2Q9和D3Q19MRT格子Boltzmann模型中,建立了D2Q9和D3Q19MRT-SMAG模型。第二,针对LBM数值模拟高雷诺数三维流场计算效率低的问题和其非常适合并行的优点以及基于CPU并行计算的诸多限制,本文采用基于GPU的CUDA并行编程模型对拓展的MRT-SMAG模型实现了并行加速。该部分介绍了GPU-CUDA并行编程模型,提出了基于GPU-CUDA并行计算在MRT-SMAG模型中的具体实现过程。通过对基于GPU并行程序性能的具体分析,可以通过合理分配线程块内线程数量,减少内核函数内的if判断语句,尽可能多的运用片上高速的共享内存等,以提高GPU上并行程序的计算效率。为了验证已建立并行程序的准确性,本文完成了长宽高比为1:3:1,Re=10000的三维单边驱动方腔流的数值实验,在该算例中基于GPU并行程序的计算效率与只用单个CPU进行计算的串行程序相比提高达145倍,由于在GPU和CPU上都采用双精度计算,而两者的计算精度没有差别。第三,为了进一步验证基于GPU并行模型的准确性和评估该模型求解湍流的能力以及分析多边驱动方腔内的流场特性,本文针对方腔流研究中存在的问题应用已建立的基于GPU-CUDA的D2Q9和D3Q19MRT-SMAG模型对二维和三维单边驱动方腔流以及多边驱动方腔流进行了数值模拟研究。对于二维方腔流,分析了腔内流场由层流向湍流状态转变的转捩雷诺数,讨论了格子网格系统,Smagorinsky常数,初始发展阶段和时间平均阶段对高雷诺数二维方腔流(雷诺数从5×104到107)时间平均量的影响;对于三维单边驱动方腔流,计算了三维方腔层流初始阶段流场,分析了边墙摩擦对三维方腔内湍流场的影响,讨论了表征其湍流脉动强度的二阶统计量;对于三维四边驱动方腔流,分析了宽高比对腔内流场特性的影响,计算了多个稳定层流解(流体分岔),讨论了宽高比对多个层流稳定解的影响,另外还评估了基于GPU并行程序对各算例的计算效率。第四,采用建立的基于GPU并行模型对三维风生流进行初步的数值研究,分析了中心对称断面的不同位置处的时间平均水平速度曲线以及表面和底部近壁区内的速度分布,给出了中心对称断面的时间平均流线和速度矢量,并将数值结果与已知的实验结果进行对比验证,结果表明MRT-SMAG模型可以求解三维风生流,和利用CUDA并行编程模型在GPU上可以极大的提高该模型的计算效率,约为178倍。

【Abstract】 Lattice Boltzmann Method (LBM) has been proved to be an efficient numerical method for simulating many complex fluid flows, such as multiphase flows, porous media, turbulent flows, etc. In contrast to the conventional numerical solution of macroscopic equation, i.e., Euler equation and Navier-Stokes equation, the LBM abstracts the fluid as many mesoscopic particles which will collide and stream in the simple lattice system in term of the mesoscopic kinetic theory, and the macroscopic variables of fluid flow will be computed by the time-space evolution of the statistical function which represents the particle distribution. Thus, the main advantages of using LBM include easy implementation of boundary conditions, short codes, simple programming and natural parallelism.In this study, the Smagorinsky eddy viscosity model is extended to the LBM for solving the incompressible flow field with high Reynolds number. Due to low computing efficiency of LBM for three-dimensional (3-D) problems with high Reynolds number, and the natural parallelism of LBM, and the limitations of the parallel computing based on CPU, the parallel programming model of Computing Unified Device Architecture (CUDA) based on the Graphic Processor Unit (GPU) is adopted to accelerate the parallel computing of LBM, and then the cavity flows and wind driven currents were numerically computed by the established parallel model. The main work of this study are follows:First, we introduce the history development, basic theory and model, boundary condition, unit conversion and the numerical implementation process on the LBM. For the simulations of incompressible flow field with high Reynolds number, the Smagorinsky eddy viscosity model is extended to the two-dimensional with nine velocities (D2Q9) and three-dimensional with nineteen velocities (D3Q19) Multiple Relaxation Time (MRT) lattice Boltzmann model based on the previous work, and then the D2Q9and D3Q19MRT-SMAG model are provided.Second, due to low computing efficiency of LBM for3-D problems with high Reynolds number, and the natural parallelism of LBM,, and the limitations of the parallel computing based on CPU,, the extended MRT-SMAG model was concurrently accelerated by the CUDA parallel programming model on GPU. In this part, we introduce the GPU-CUDA massive parallel programming model firstly, and then the implementation details of LBM with GPU-CUDA. By the analysis on the performance of the GPU-based parallel program, it can be concluded that the computational efficiency of the code could be improved by reasonable distribution of the number of thread in thread block, reducing the if judgment statements in the kernel function, using the high speed shared memory on chip as much as possible. To validate the code, the numerical experiment of Re=100003-D one-sided lid driven cavity flow which has the ratio of length, width and height is1:3:1was performed, the speedup is up to145times than CPU-only codes. Since the use of the same double precision in the GPU and the CPU codes, there is no accuracy mismatch problem.Third, due to the existed problem on the cavity flows, the numerical simulations of high Reynolds number two-dimensional (2-D) and3-D one-sided lid driven cavity flow, and3-D four-sided lid driven flow are researched by the established GPU-CUDA D2Q9and D3Q19MRT-SMAG model for further validating the established parallel model, assessing the capability of these models simulations of turbulent flows, and analyzing the flow field features in the multi-sided lid driven cavity. For2-D cavity flow, the transition Reynolds number which stands for the changing from laminar flow to turbulent flow in the cavity is analyzed, and the effects of the lattice grid system, Smagorinsky constant, initial running and time-averaged period on the mean macroscopic variables for high Reynolds number2-D cavity flow are discussed. For3-D one-sided lid driven cavity flow, it is proved that the MRT-SMAG model has the ability to solve the initial stage of3-D flow field, and the frictional effects of side wall on the flow pattern in the cavity are analyzed, and the two order statistics for the turbulent intensity are also discussed. For3-D four-sided lid driven cavity flow, the effects of transverse aspect ratio on the flow field features are discussed, and the multiple steady solutions (flow bifurcation) are computed. Also, the effects of various transverse aspect ratio on the multiple steady solutions which are produced when the flow is unstable are reported. In additional, the computational efficiency of the GPU-based parallel program for these examples is investigated.Fourth, a preliminary study of wind driven current is considered by the GPU-based parallel model. The time-averaged horizontal velocity profiles at different location on the symmetry plane are analyzed, and the velocity distributions on the inner-law coordinates relative to the shearing surface and bottom are presented. Then, the time averaged streamline contours and velocity vector on the symmetry plane are given. Also, the numerical results are validated by the laboratory experimental data. It is shown that the MRT-SMAG model has the ability to simulate the3-D wind driven current, and its the computing efficiency could be greatly improved by the CUDA parallel programming model on GPU, about178times.

节点文献中: