节点文献
基于GPU计算平台的电磁散射计算并行加速技术
Parallel Acceleration Technique for Electromagnetic Scattering Problems Based on GPU Computing Platforms
【作者】 高鹏程;
【导师】 林海;
【作者基本信息】 浙江大学 , 计算机科学与技术, 2013, 博士
【摘要】 目标电磁散射计算,尤其是电大目标雷达散射截面预估与逆合成孔径雷达成像,对于国防建设有着十分重要的意义,一直是计算电磁学的研究热点之一。但是在解决飞机、舰船等实际目标的高频电磁散射特性分析问题时,往往会遇到计算量巨大和硬件计算能力不足等难题。本文为解决目标电磁散射特性的快速计算问题,借鉴计算机图形学中快速射线追踪等技术,并利用图形处理器(GPU)的强大的并行数值计算能力,分别采用GPU、CPU-GPU异构架构和GPU集群三种计算平台对频域电磁计算方法进行并行加速。本文提出了基于统一计算设备架构(CUDA)的多分辨率弹跳射线法,该方法综合使用了弹跳射线法的两类加速算法。第一,通过采用多分辨率射线管,有效地减少了参与计算的射线管总数;第二,使用基于线索增强的无堆栈kd树遍历算法,大大减少了不必要的内部节点遍历,加速了单根射线与目标的求交。在GPU平台上,本文还基于CUDA对矩量法进行了加速。在阻抗矩阵填充过程中,通过应用不同的核函数分别计算奇异性元素与非奇异性元素,避免了CUDA对分支语句的序列化处理带来的效率下降。并且基于CUDA提供的基础线性代数运算库CUBLAS开发了稳定双共轭梯度法,提高了矩阵方程求解的计算效率。本文将弹跳射线法和截断—增量长度绕射系数映射到CPU-GPU异构架构上,高效地充分利用了所有可用计算资源。在该方法中,利用GPU强大的单精度浮点运算能力加速弹跳射线法,而考虑到截断—增量长度绕射系数对于数值精度相对较为敏感,选择在CPU上基于双精度浮点数对其进行实现。根据相邻角度计算负载和计算时间几乎相同这一事实,采用基于前一角度计算时间来调整当前角度负载分配的动态负载均衡算法,保证CPU与GPU之间的负载均衡。该方法提升了高频方法在目标成像等应用中的计算精度和效率。最后,本文还提出了基于GPU集群的并行弹跳射线法,该方法采用虚拟孔径面划分的并行策略,克服了基于角度的负载分配方案受GPU数量限制的不足。为保证GPU节点间的负载均衡,该方法并不依赖于各个计算节点计算能力相同这一假设,而是基于前一角度各节点的计算时间来动态调整当前角度下虚拟孔径面的划分,因此该方法也适用于配备不同GPU的异构GPU集群。本文结合使用图形学中的快速射线追踪技术,及GPU、CPU-GPU异构架构和GPU集群三种计算平台,对多种频域计算方法进行加速,有效地提升了电大目标电磁散射分析的精度与计算效率。
【Abstract】 The calculation of the electromagnetic scattering of targets, especially the radar cross section (RCS) prediction and the inverse synthetic aperture radar (ISAR) imaging, has important significance for the national defense construction. It is also a hot research topic in computational electromagnetics. However, it is very time-consuming for analyzing the electromagnetic scattering characteristic of the realistic targets (e.g., airplanes and ships) at high frequency due to extensive computation and insufficient processing power.In order to solve the electromagnetic scattering problems fast, this thesis adopts the real time ray tracing algorithm in computer graphics, and utilizes the GPU, the heterogeneous CPU-GPU architecture and the GPU cluster to accelerate the frequency-domain methods by exploiting the powerfully parallel computing ability of the GPU, respectively.The proposed CUDA-based multiresolution shooting and bouncing ray (MSBR) method with the kd-tree acceleration structure is fully implemented on the GPU to accelerate the SBR method. The multiresolution grid algorithm can greatly reduce the total number of ray tubes, as it adaptively adjusts the density of ray tubes for regions with different complexities of their structures, while the kd-tree acceleration structure can highly decrease the number of ray-patch intersection tests. We also present a CUDA-based MOM, which calculates the singular and non-singular elements of impedance matrix separately to avoid the performance degradation resulting from the branch divergence. Additionally, the CUBLAS library provided by CUDA is applied to develop the BiCGSTAB to efficiently solve the matrix equation.The SBR and the truncated wedge incremental length diffraction coefficients (TW-ILDC) are combined and implemented on the heterogeneous CPU-GPU architecture to fully utilize all available resources. The SBR is calculated in the GPU because numerous independent ray tubes can make full use of the massively parallel resources on the GPU, while the TW-ILDC is implemented on the CPU since it requires complex and high-precision numerical calculation to get the accurate result. As the workload and the computation time of neighboring aspect angles are similar, a dynamic load adjustment method is presented to achieve reasonable load balancing between the CPU and GPU. The proposed method provides higher accuracy and efficiency for ISAR imaging of electrically large complex targets.Finally, an efficient parallel shooting and bouncing ray (SBR) method on the GPU cluster is introduced. The parallel SBR method applies the virtual aperture partitioning scheme to overcome the drawback of angle distribution scheme. This method is not based on the assumption all the GPUs have the same performance, and it employs the computational time at the previous angle to dynamically adjust the partitioning at the current angle. This strategy not only achieves excellent load balance, but also makes the proposed method work well on the heterogeneous GPU cluster.This thesis combines the real time ray tracing algorithm in computer graphics and three parallel computing platforms, i.e. the GPU, the heterogeneous CPU-GPU architecture and the GPU cluster, to improve several frequency-domain approaches. The numerical results show the above-mentioned methods improve the accuracy, efficiency and scale of the analysis of scattering characteristic of the electrically large targets.