节点文献

可扩展并行代数多重网格算法研究

Research on Scalable Parallel Algebraic Multigrid Algorithms

【作者】 徐小文

【导师】 莫则尧;

【作者基本信息】 中国工程物理研究院 , 计算数学, 2007, 博士

【摘要】 随着高性能计算机系统的日益普及和性能的大幅度提升,复杂物理系统的精细数值模拟成为可能。此时,偏微分方程组隐式离散后,通常需要求解大规模稀疏线性代数方程组。当前,迭代法是求解该类方程组唯一可行的方法。但是,由于物理系统的复杂性和系数矩阵的大规模,普通的迭代方法难以适应。这种不适应主要表现在两个方面:第一,迭代法不具备良好的算法可扩展能力,即算法的收敛迭代次数严重地依赖于系数矩阵的规模和性质;第二,迭代法不具备良好的并行可扩展能力,即单次迭代的并行效率低。一个高效的并行迭代法,应该同时具备良好的算法可扩展能力和并行可扩展能力。因此,针对复杂物理系统,设计适应数百乃至数千甚至数万个处理器的高效迭代法是当前高性能科学计算的一个前沿课题,具有重要意义。代数多重网格方法(AMG)是具备良好算法可扩展性能的一类迭代法,在复杂物理系统的串行数值模拟中得到广泛应用。在AMG算法中,网格粗化策略是影响算法可扩展能力的重要因素,一个最优的粗化策略应该具有尽可能低的算子复杂度,并保持良好的迭代收敛性质。遗憾的是,这种策略通常是内在串行的,不具备并行度。因此,长期以来,网格粗化策略的串行本性一直是阻碍其应用于大规模并行计算的关键因素。近年来,大量的研究工作通过牺牲算法可扩展能力获取并行度,提出了一系列并行粗化策略,力争在算法可扩展能力和并行可扩展能力之间寻求最佳的权衡点,以便AMG算法适应大规模并行计算。本文围绕这一前沿课题开展研究,具有重要的理论和实际应用价值。本文的主要贡献如下:1.针对迭代法,发展了一套具有普适性的、衡量算法可扩展能力和并行可扩展能力的分析方法。该方法从算法可扩展和并行可扩展的两个角度,衡量它们对并行计算时间的贡献。如果固定每个处理器的计算规模,增加处理器个数时,相对于单机情形的执行时间,并行计算时间将被放大。此时,算法可扩展能力和并行可扩展能力可分别由各自对放大倍数的贡献因子来定量描述。运用该方法,本文剖析了并行AMG算法的性能瓶颈,获得了若干有启示性的结论,明确了课题的研究方向。2.提出了两个松弛型并行粗化策略:RRS和RCLJP。它们是当前保持良好算法可扩展能力的前提下,在数百个处理器上,获得最佳并行迭代性能的并行粗化策略。该类策略的主要思想是:针对完全并行的RSO和CLJP策略严重破坏算法可扩展能力的缺陷,通过设置同步条件并在同步时交换拟边界层网格粗化信息,各个处理器协同地完成并行粗化。3.基于稀疏线性代数方程组中未知量之间依赖关系的强弱程度,从代数角度,提出单尺度矩阵和多尺度矩阵的概念,并根据这些概念,提出网格结块技术。该技术基于图搜索算法,将网格划分成互不重叠的若干单尺度块。在每块的内部,未知量之间的依赖程度是近似相同的,于是,高度并行的粗化策略和简单的点松弛光滑算子可以直接运用,它们不会降低算法可扩展能力。但是,块的边界是单尺度向多尺度的过渡区域,其中称之为代数界面的子区域,需要认真对待。多尺度矩阵概念和网格结块技术为并行AMG算法的设计提供了新的启示性思路。4.针对多尺度稀疏线性代数方程组,基于网格结块技术,提出了两类AMG算法:小尺度优先松弛和粗化的AMG算法(SPRC-AMG),以及界面优先松弛和粗化的AMG算法(IPRC-AMG)。前者通过不断地粗化小尺度块内的网格结点,配合块内的点松弛,逐步消除或消弱矩阵的多尺度性,使得所构造的粗网格方程组易于求解;后者认为代数界面决定AMG算法的收敛性质,因此,首先应该粗化代数界面的网格点,然后粗化单尺度块内的网格点,从而在保持低算子复杂度的前提下,获得良好的收敛因子。对于SPRC-AMG算法,在对称正定的情形下,本文给出了收敛性证明,并估计了收敛因子。5.将界面优先粗化的启示性思想应用于松弛型并行粗化中,得到了界面优先松弛型并行粗化策略(RIPC)。该策略视界面优先粗化为松弛型并行粗化中的同步条件,需要分别在界面和非界面区域执行并行粗化。更进一步,采用当前国际上最具并行可扩展能力的PMIS方法,可得到RIPC-PMIS并行粗化策略。基于该策略的并行AMG算法继承了PMIS的低算子复杂度和高并行度,同时,保持了IPCR良好的收敛性质,是一个高效的并行粗化策略。百个处理器上的并行数值实验验证了算法的有效性。6.针对二维三温能量方程,提出了基于物理量的网格粗化策略和一个两层迭代算法。该算法基于物理量粗化策略和相应的C/F块松弛光滑算子,将求解三个温度耦合的稀疏线性代数方程组化为若干个单温方程组的求解。在此基础上,将IPRC-AMG算法应用于求解这些单温方程组。模型问题和实际应用问题的数值实验均表明了两层迭代算法和IRPC-AMG算法的有效性。

【Abstract】 With the development and application of advanced parallel computers, the high-resolution numerical simulation for complicated multi-physics system is possible. Solving large scale sparse linear systems arising from the discretization of Partial Differential Equations(PDEs) are required in many numerical simulations, and iterative method is the only feasible way to solve such type of equations due to the storage and time complexity constraint. However, due to the complexity of the physical model and large scale of the systems, scalable iterative methods are required. Many factors contribute to the scalability of the iterative method, mainly including two aspects: algorithmic scalability and parallelization scalability. The former describes how the total computational works grow with the size of problem and the number of processors. The later describes the parallel efficiency of a single iteration on a parallel computer. Most iterative methods used today, however, lack the scalability mentioned above. So, it is urgent to design scalable iterative methods to address the difficulties encountered in the large-scale complicated physics simulations using hundreds or thousands of processors.Algebraic multigrid (AMG) is a well known iterative method with optimal algorithmic scalability for many complicated real applications. However, the parallelization of AMG is challenging because of the component of grid coarsening is sequential. In recent years. Many parallel coarsening strategies have been presented by balancing the algorithmic scalability and the parallelization scalability. This dissertation mainly focus on this topic and presents some new methods and techniques.The main contributions of this dissertation are summarized as follows:1. Firstly, in order to measure the algorithmic scalability and the parallelization scalability of a iterative method, a suit of scalability analysis approach is presented. We introduce the concept of scalability factor to describe how the parallel execution time will be enlarged while compared to the serial execution. So, the algorithmic scalability and parallelization scalability are described by their contributions on the scalability factor. Secondly, we analyze the scalability of parallel AMG algorithms using this method. Some important results are got for the design of scalable algorithms.2. Two new parallel coarsening strategies are proposed: Relaxed-type RS and CLJP (RRS & RCLJP). The main idea of these strategies is to smartly synchronize processors during the coarsening process of well-known RS0 or CLJP strategies. Qualitative analyses and numerical experiments show that our new strategies perform better using hundreds of processors, not only in faster convergence but also in smaller operator complexity. They show better scalability if we compare them to well-known parallel coarsening strategies such as RS3, Falgout and CLJP.3. The concepts of single-scale and multi-scale for sparse matrix are presented according to the strength of connection among grid points from the algebraic point of view. These concepts discover the difficulties on the solution of the linear systems. A blocking technique is presented to find all single-scale blocks partitioning the grid. In each block, connections have the same scale of strength, so, inherent parallel coarsening strategies and simple point relaxation scheme can get good algorithmic scalability. However, in the regions linking different scales of blocks, we call them the algebraic interfaces, the coarsening and relaxation should be especially designed. Based on these observations, new heuristic approaches for effective AMG algorithms are presented.4. Based on the blocking technique, two new AMG algorithms are presented. One is the SPRC-AMG method using the Smallest-scale Primary Relaxation and Coarsening strategies, another is the IPRC-AMG method using the Interface Primary Relaxation and Coarsening strategies. The main idea of the former is to decrease or eliminate the multi-scale property of the coarse operator through relaxation and coarsening in the local blocks with smallest scale property. The later method firstly selects coarse points within the algebraic interfaces, and second selects other coarse points in the interior of blocks. Numerical experiments show the effectiveness of our methods. The convergence analysis of the SPRC-AMG method is also presented.5. By integrating the interface primary coarsening into the relaxed-type parallel coarsening strategies, interface primary relaxed-type parallel coarsening strategies (RIPC) is introduced. In this strategy, interface primary coarsening is regarded as the synchronization condition of the relaxed-type method. Specially, we use the well-known parallel coarsening strategy of PMIS in both the algebraic interfaces and the blocks. So, RIPC-PMIS coarsening strategy is formed. Numerical results on hundreds of processors show the effectiveness.6. A novel two-level iterative method is presented for solving 2D radiative diffusion equations with photon, electron, ion temperatures. The main idea of this method is to decouple one temperature from the other two by a special coarsening strategy, where all the variables related to electron temperature are forced selected coarse points, other variables (photon and ion temperatures) are all forced to be fine points. So, only several single temperature equations need to be solved instead of the coupled linear systems. Because of the multiscale property of the single temperature equations, we use the IPRC-AMG method presented in this dissertation to solve those single temperature equations. Numerical results show the effectiveness of our method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络