节点文献

复杂目标电磁散射特性精确计算方法的并行化和实现

The Parallellization and Implementation of Mlfma for the Accurate Solution of Electromagnetic Scattering by Large Complex Objects

【作者】 王辛刚

【导师】 童维勤;

【作者基本信息】 上海大学 , 高可信计算与信息处理, 2011, 博士

【摘要】 复杂目标电磁散射特性的快速分析与计算在雷达系统设计与雷达目标识别、目标隐身和反隐身、现代电子系统电磁兼容性分析等领域中有广泛应用。利用计算机进行精确电磁仿真具有相对高效、准确的特点。特别是多层快速多极子算法(MLFMA)在电磁领域的成功开发,极大地提高了电磁仿真能力。但是受到单台计算机资源的限制,串行MLFMA算法对电大尺寸散射体分析常常无能为力。随着计算机集群技术的发展和并行计算技术的成熟,采用并行MLFMA算法计算复杂目标电磁散射特性,可以有效扩展仿真规模,提高仿真效率。对于具有复杂边界的电磁问题,实现高性能并行MLFMA算法的关键是如何在处理器间平均分配计算资源(即各处理器负载均衡)、减少处理器间的数据交换,减少算法实现环节中所占用的内存和计算量。这些问题的解决对最终计算效率和计算能力有直接影响。为解决上述问题,本文将MLFMA算法的计算域分解为数据域和角谱域。针对数据域和角谱域的不同并行性能,提出一种更为高效的并行方案,给出具体实现手段并分析其数值性能。论文的主要创新点如下:提出了分布树的负载均衡算法。并行MLFMA的负载均衡性主要体现在数据域上对分布树的划分上。传统的划分方法通常是在分布树的最底层将组均分到每个计算节点,每个计算节点采用自底向上的方式构建分布树。这对于外形完全对称的目标来说,具有非常好的负载均衡性。然而,实际的目标都具有复杂的外形。为此,我们修正了分布树的分割方法,通过均分某一层(非最细层)的方式来构建分布树,从而达到负载均衡的目的。提出了对(反)插值过程进行块划分的角谱域并行算法。在角谱域上并行MLFMA算法,具有非常好的负载均衡性。但是由于(反)插值操作的存在,节点间的通信是无法避免的。角谱域分解的主要问题是如何减少节点间通信量,增加共享层的层数。对(反)插值矩阵采用条划分的方法,尽管能够将通信限制在左右相邻的节点间进行,但同时也限制了程序可并行节点数的规模。为解决这个问题,我们提出了基于块划分的并行计算方法。这种方法能够将通信限制在上下左右四个相邻节点间进行,扩大了可并行节点数的规模。与条划分相比,大大减少了每次的通信量。进行了不变项的并行优化计算。本文针对分布树、(反)插值矩阵、转移矩阵和近邻阻抗矩阵等数据结构在计算时间和计算空间上的特点,从缩减内存和计算量的角度出发,提出了相应的计算和存储方法。对(反)插值矩阵提出了先串行计算再并行分布的方法。针对转移矩阵,计算过程采用了压缩存储的并行计算方式,分布过程中采用非压缩的存储方式。相对于传统的“边对”划分方法,提出了按组对划分近邻阻抗矩阵,利用数据局部性原理实现近邻阻抗矩阵的填充及近场的叠加。提出了转移和(反)插值操作的通信拓扑结构。利用转移通信列表确定远亲组之间的通信关系,通过建立转移通信的发送队列和接收队列,一次性交换数据,提高转移通信的通信效率。根据(反)插值矩阵的特点,利用虚拟进程拓扑,建立了(反)插值操作的通信拓扑结构。通过调整行和列的通信次序,实现了与四周相邻节点间的通信。针对各种复杂目标,本文围绕并行算法的精度和效率进行了大量实验。实验结果验证了所提出并行MLFMA算法的可行性,能够有效提高并行效率。该算法能够实现未知量在千万量级的复杂目标电磁散射,并在实际工程中获得应用。

【Abstract】 The fast analysis and calculation of electrically large complex electromagnetic scattering problems is widely used in the radar system design and radar target identification, the radar stealth and anti-stealth technology, and the modern electric systems’Electro Magnetic Compatibility (EMC) analysis. It is efficient and accurate for the electromagnetic simulation on the computer. Particularly, the successful development of the Multilevel Fast Multipole Algorithm (MLFMA) on electromagnetic fields immensely increases the ability of the electromagnetic simulation. Meanwhile, accurate solutions of the large scale electromagnetic scattering problems require discretisations with millions of unknowns, which cannot be solved easily by the sequential implementations of MLFMA running on a single processor. With the development of the computer cluster technology and the parallel computation technology, the parallelization of MLFMA for solving the electromagnetic scattering is an effective way to enlarge the simulation scale and improve the simulation efficiency.For electromagnetic scattering problems with the complicated boundary, the key of parallelizing MLFMA is distributing the computer resources equally, reducing data communication among processors, optimizing the space complexity and the computing complexity in every link of the software design, which will influence the parallel efficiency and the computing ability of the algorithm. In order to solve the above problems, the computing domain in MLFMA is divided into the data domain and the angular spectrum domain. Toward the different characteristics of two domains, a high-performance parallel MLFMA is proposed and implemented and the numerical performance of which is analyzed. The main innovation points of the thesis are as follows:The load balancing algorithm of the distributed tree is proposed. In the data domain, the load balancing of the parallel MLFMA is mainly embodied by the division of the distributed tree. For the traditional method, the boxes are usually partitioned equally at the finest level of the distributed tree among processors, and then the distributed tree is constructed with bottom-up. To the symmetrical object, this tree has a good load balancing, whereas to the complex objects in the real life which have an unsymmetrical shape, this method made the load balancing very badly. Therefore, a new division algorithm of the distributed tree is applied to improve the load balancing, which partitions some layers (non-final layer) equally to construct the distributed tree.The angular spectrum domain decomposition algorithm with the block method of the interpolation and anterpolation operation is proposed to reduce the data communication among processors. Generally, the load balancing at the angular spectrum domain is better than that at the data domain, but the communication of the interpolation and anterpolation operation between shared levels cannot be avoided. The key of the angular spectrum domain decomposition is to reduce the communications and increase the number of the shared levels. Although the traditional strip scheme can make the communications between the right and left processors, it can also restrict the workable processors. So the block method of the interpolation and anterpolation operation is adopted to reduce communication burden and enlarge the parallel scales, which makes the communications available among the adjacent processors.In the preprocess phase of the parallel-MLFMA, the correlative data structure which includes the interpolation (or anterpolation) matrix, the translation matrix, the near-neighbor impedance matrix and so on are considered for their computing complexities and storage requirements. The interpolation (or anterpolation) matrix is first computed in the sequential algorithm, and then distributed in the parallel algorithm. The translation matrix is stored using some compressed scheme during the computation process and uncompressed during the distribution process. The distribution of the near-neighbor impedance matrix is different from that of the far field, which is computed with the local priciple of data.The topology structures of communications for the translation operation and the interpolation and anterpolation operation are also established in order to improve the communication efficiency during its iterative process. During the communications of the translation operation, the outgoing queue and the recipient queue are set up to improve the communication efficiency, which can exchange data for one time among processors. According to the theory of the interpolation and anterpolation operation, this topology structure of communications is created by the Cartesian virtual topology. A clever communication order of columns and rows is designed to realize communications among the adjacent processors.Many experiments have been done on the efficiency, accuracy and capability of the parallel MLFMA with all kinds of complex objects. Experimental results manifest the significant performance improvement of the parallel-MLFMA. This high-performance parallel MLFMA has successfully computered scattering with over ten millions unknowns and has been applied in the practical engineering.

  • 【网络出版投稿人】 上海大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络