节点文献

基于多GPU的FDTD并行算法及其在电磁仿真中的应用

Multiple GPUs Based FDTD Parallel Algorithm and Its Applications in Electromagnetic Simulation

【作者】 杜刘革

【导师】 李康;

【作者基本信息】 山东大学 , 无线电物理, 2011, 博士

【摘要】 理论、实验与计算相结合已成为科学研究的基本模式,在电磁科学与工程领域中,时域有限差分(FDTD)算法已成为进行电磁场分析的重要方法。FDTD算法是一种麦克斯韦(Maxwell)方程组的时域求解方法,直接将电磁场按照Yee网格的方式进行离散,在空间及时域上利用中心差分近似Maxwell旋度方程中的偏微分,就可以实现电磁场在时域的交替递推。其实现简洁,易于理解,对各种形状以及各种材料的介质有着广泛的适应性;因为FDTD方法直接求解Maxwell方程组,所以各种电磁现象均隐含其中,因此其适用于求解电磁场的辐射、传输及散射等各种问题。自从1966年FDTD由Yee提出以来,也在不断地发展并已广泛地应用于各频段的电磁场仿真领域。作为一种差分方法,受到数值色散及数值稳定性的影响,为保证FDTD算法的精度,对网格划分有着较为严格的限制。一般其空间步长要小于波长的1/10,当物体结构更为复杂时,空间取样点更要足够多以尽可能真实地模拟物体,而时间步长要满足Courant稳定性条件,与空间步长相关。因此进行电大问题或者精细结构问题的计算时,FDTD方法往往是十分耗时的。FDTD算法具有天然可并行优势,因此进行并行计算可有效地减少计算时间,加速仿真设计进度。FDTD并行计算主要集中在基于网络设备的并行算法上,如超级计算机以及个人计算机集群,但由于成本及网络速度影响,这种并行方式的性价比并不高;基于可编程器件的FDTD并行算法也得到部分研究者关注,不过由于可编程器件的复杂性以及器件发展问题也并未得到广泛应用。近年来,图形处理器(GPU)受到游戏市场需求的带动以超过摩尔定律的速度发展,而且其浮点运算能力远高于同时期CPU的运算能力,所以GPU在通用科学计算领域中的应用也逐渐受到关注,如今随着通用图形处理器(GPGPU)技术的迅速发展,GPU已广泛应用于各种通用算法以及各领域的科学计算中,在电磁计算方面特别是FDTD算法上的应用得到了研究者的广泛关注。计算统一设备架构(CUDA)模型出现以后,使得通用图形处理器并行程序的开发更为快速高效,受到科学研究者的欢迎并迅速应用于各学科的计算领域。本论文研究课题来源于国家重点基础研究发展计划项目:金属/介质纳米异质结构中的局域耦合效应及其在光电转换器件中的应用,本论文研究内容为其中的应用GPU技术进行发光二极管(LED)并行仿真计算系统研究部分,主要研究了基于GPU的FDTD并行算法,最终实现了多GPU平台上的FDTD混合并行运算,极大地提高了利用FDTD算法进行电磁仿真的运算速度,已应用于LED的仿真设计中,进行了LED发光增强研究。论文主要分为以下几个部分:首先,本论文对研究相关的基础做了介绍,包括电磁计算以及并行计算基础,说明了本文的研究意义以及主要内容,然后对并行计算技术进行了研究,分析了各种并行方法的特点,并对GPU以及通用图形处理器技术的发展应用作深入探讨,研究了CUDA模型的软硬件基础以及编程模型,最终选择CUDA模型作为研究FDTD并行算法的基础。其次,本文研究了基本FDTD算法原理以及相关知识,如数值色散、边界条件以及激励源等,然后讨论了并行FDTD计算的发展现状,引出本文所要研究的具体内容。论文提出了一种在CUDA架构下二维及三维FDTD并行算法的实现方式,并实现了二维FDTD算法的各向异性完全匹配层(UPML)吸收边界条件,以及三维FDTD算法的UPML和卷积完全匹配层(CPML)吸收边界条件,实现的入射源包括二维线电流源,三维偶极子源以及平面波入射源,并且在平面波入射源的加入中也实现了一维Mur吸收边界条件的FDTD并行算法。本文提出利用二维线程组织控制电磁场的递推的方式处理二维问题,并提出了多种存储器访问优化方案,包括共享存储器的两种访问方式以及纹理存储器的使用等。在处理三维问题时,本文提出并实现了两种线程组织方案,并对两种方案进行了优化,对比了其计算速度,相对于传统CPU串行算法均达到了10倍以上的加速比。针对UPML和CPML的不同特点,本文采取了扩展PML以及分立计算的不同处理方式,并采取了相应的优化方式,在保证计算精度的前提下,均实现了较高的计算速度,与串行算法相比普遍达到20倍以上的速度提升,最高达到了58倍的加速比。在单GPU并行计算的基础上,本文将并行算法扩展到多GPU平台。采用FDTD区域分解以及合理的边界交换方案,并利用GPU与CPU内存之间的同步数据传输方案实现了FDTD算法的多GPU并行,为降低数据传输的影响,本文针对多GPU的FDTD算法提出了异步数据传输方案,经验证本方案能够有效地提升多GPU的并行效率。首次实现了GPU内部并行计算,GPU之间并行计算以及数据传输与计算之间的任务并行的FDTD混合并行计算。本文对多GPU算法进行性能测试,包含10层CPML的FDTD算法,在8块GTX295组成的计算平台上达到了4000Mcells/s以上的运算速度。本文利用GPU运算平台研究了三维FDTD算法中CPML各参数对其吸收效果的影响,进行了微带天线以及滤波器的仿真分析。本文提出了利用FDTD算法计算偶极子辐射功率的方法,在多GPU平台上进行了验证,并利用此方法计算了LED模型的辐射光功率,并利用顶部光子晶体提高了其辐射功率。

【Abstract】 Combination of theory, experiment and computation has become the basic pattern of scientific research. In the electromagnetics science and engineering field, finite-difference time-domain (FDTD) has been an important method for electromagnetics analysis. FDTD is a time domain method solving for Maxwell equations. Electromagnetic fields are discretized with Yee cells. Maxwell equations are changed to difference equations by using central difference both in space and time domain. Then electric fields and magnetic fields can be updated alternatively in time domain. This method is simple both in implementation and comprehending. Most dielectric and complex objects can be constructed easily with this method. And it can be used to solve radiation, transmission and scattering problems because all the propagation phenomena are implicitly taken into account throughout its formulation. It has been developing and widely applied in the electromagnetics simulations in any band of the whole spectrum since 1966 proposed by Yee.As a difference method, FDTD is restricted by numerical dispersion and stability, and therefore the space and time step must be small enough to guarantee the accuracy of FDTD method. The space steps should be less than 1/10 of the wavelength generally. If the geometric model is more complex, samples in one wavelength should be increased to simulate the object as closely as possible. The time step must be satisfies Courant stability condition, which has relationship with the space step. So it will be take long time to simulate electrically large or fine structures using FDTD method.As FDTD is an inherently data parallel algorithm, parallel computing is an efficient way to reduce computation time and accelerate the progress of simulations. Most parallel FDTD computing algorithms are based on computer network, including supercomputer systems and personal computer clusters. However, this method is not cost-effective because of the expensive equipments of supercomputers or the network speed of clusters. Using programmable devices is another way, but the hardware program language is too complex to do FDTD computing and the developing of devices is slower than the personal computer. So this method has not been widely used.In recent years, graphics processing unit (GPU) has been developing faster more than Moore’s Law as the developing of game demand. The floating-point processing performance of GPU is much higher than contemporary CPU. The implementation of GPU in general scientific computation is an increasing concern. And more and more general algorithms in many scientific fields are applied on GPU with the developing of general purpose computation on GPU (GPGPU) technology. The programming on GPU becomes rapid and efficient as the appearance of compute unified device architecture (CUDA) model. It has been popular with scientific researchers and applied in many fields rapidly.The content of this dissertation is a part of a National Basic Research Program of China, which is named effect of localization coupling in metal/dielectric nano heterogeneity structure and its’applications in photoelectric conversion devices. Its purpose is researching parallel computation system for simulation of light emitting diode (LED) by using general computation on graphics processing unit technology. In this dissertation, parallel FDTD algorithm is studied. Hybrid parallel FDTD computing is implemented on multi-GPU platforms, which greatly improves computational speed of simulation with FDTD method. The parallel computational system researched by this dissertation is used for LED simulation, such as enhancement light emission power by using top photonic crystal. This dissertation is divided into the following sections:Firstly, background and related knowledge is presented, including electromagnetic computing and basic information of parallel computing technology. The significance of the study and the content are introduced. Then parallel computing technology is studied. Various computing methods are demonstrated and contrasted. The development and application of GPU and GPGPU technology are discussed. Software and hardware environment of CUDA are investigated. And CUDA model is chosen to be used as parallel FDTD computing implement.Secondly, Basic FDTD algorithm and relative knowledge are introduced, such as numerical dispersion, boundary conditions and sources. The situation of parallel FDTD computing development is discussed, which induces the content of this dissertation.Two-dimensional (2D) and three-dimensional (3D) parallel FDTD algorithm implematations are proposed based on CUD A model.2D FDTD with uniaxial perfect matched layer (UPML), three dimensional FDTD with UPML and convolutional PML (CPML) are implemented on GPU. Line electronic current source in 2D, dipole and plane wave sources in 3D are implemented. One-dimensional FDTD with Mur absorbing boundary condition is implemented in 3D plane wave sources application.2D thread assigned to control electromagnetic field updating for solving 2D problems. Several memory access optimization schemes are proposed in order to accelerate computing speed, such as two ways for shared memory access and using texture memory. Two thread arrangement schemes are proposed and implemented to solve 3D problems. Optimization is proposed and speed of two schemes is contrasted, which is shown that above 10 times speedup are obtained in almost every case. PML parameters expanding and discrete computing are used to process UPML and CPML respectively. And corresponding optimization approaches are implemented for each PML. The speed of PML-FDTD computation is accelerated above 20 times commonly ensuring the computational accuracy.The parallel FDTD algorithm is extended to multiple GPUs (multi-GPU). Domain decomposition and appropriate boundary data exchanging are used in multi-GPU system, and synchronous memory copy scheme is used for data exchanging between GPU and CPU memory. In order to hide the memory transmission time, asynchronous memory copy scheme is used, which is proved to be efficient for multi-GPU parallel computing. Parallel computing on single GPU, parallel computing on multi-GPU, parallel tasks of computing and data exchanging is implemented for the first time. The performance of these schemes is evaluated on multi-GPU system, which contains 8 GTX295 graphics cards. Speed of above 4000Mcell/s is obtained in 3D FDTD application with 10 layers CPML.The effect on absorption of parameters in CPML is tested on GPU platform. Microstrip antenna and filter are simulated by 3D parallel FDTD computing. Method of calculating radiation power of dipole with FDTD is proposed and verified on multi-GPU system. A light emitting diode (LED) model is computed and its radiation power is calculated with our method. Photonics crystal is used for emitting enhancement.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2011年 11期
  • 【分类号】TN011;TP391.41
  • 【被引频次】7
  • 【下载频次】948
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络