节点文献

分布式存储环境下矩阵广义特征值问题的并行计算

Parallel Computing the Matrice Generalized Eigenvalue under Distributed Memory Environments

【作者】 魏立峰

【导师】 李晓梅;

【作者基本信息】 中国人民解放军国防科学技术大学 , 计算机科学与技术, 2002, 博士

【摘要】 矩阵广义特征值问题是当前迅速发展的计算机科学和数值代数中的一个非常活跃的研究课题。它在很多应用中扮演非常重要的角色,从数学角度来看,矩阵特征值问题的应用大多来自数学物理方程、差分方程、Markov过程等。目的是为了计算固体、流体、电磁、微观粒子、系统控制等重大问题。在实际的科学研究与工程应用中,比如在建筑工程、航空航天研究、生物科学、计算物理以及石油勘探中,都要涉及到大规模矩阵广义特征值问题的计算。 大规模并行处理系统(MPP)和PC机群为并行求解矩阵广义特征值问题提供了分布式存储环境。本文从理论和实验两方面深入研究了分布式环境下实矩阵广义特征值问题Ax=λBx的并行计算,提出了一些新的算法。本文的主要研究成果概括如下: (1)关于实对称定三对角矩阵广义特征值问题 ⅰ) 提出了一种基于等规模矩阵划分策略的分治算法,子问题的规模之和等于原问题的规模。该算法利用割线法迭代计算广义特征值,具有计算量少,计算速度快的优点,并且算法具有内在并行性。理论分析和数值实验均证明该算法比K.Y.Li算法快,在矩阵规模较大时,计算时间约可以减少10%-20%。当n→∞时,加速比S_p→p,在分布式环境下可以获得很好的线性加速比和稳定的效率。 ⅱ) 提出了一种具有全新矩阵划分策略的分治算法。在这个划分策略中,子问题的规模之和等于原问题的规模减1,文中给出了特征值分割定理及其证明。该算法所提供的特征区间要优于第一种算法,因而在采用相同的迭代方式时,算法具有计算量少、计算时间短的优点。在矩阵规模较大时,计算时间约可以减少30%甚至更多。同时,算法也具有良好的内在并行性,当n→∞时,加速比S_p→p,在分布式环境下可以获得很好的线性加速比和稳定的效率。 (2)关于实对称带状矩阵广义特征值问题 ⅰ) 提出了一种结合多分法的并行分治算法,给出了特征值分割定理及其证明。在该算法中,子问题的规模之和等于原问题的规模。算法利用二分法结合广义Rayleigh商迭代,计算包含在特征区间内的每个特征对。在计算全部特征值和特征向量的情况下,算法的计算复杂性为O(n~2r~2),当r<<n时,优于坛一一一~一续攀遨粼望廷二生掣夔 LApACK的计算复杂性O(n’)。在并行实现时,算法结合并行性好的多分法, 在分布式环境下获得了很高的并行加速比和效率。 ii)提出了另一种结合多分法的并行分治算法,给出了特征值分割定理及其证明。 在该算法中,子问题的规模之和等于原问题的规模减;,其中;为半带宽。在 计算全部特征值和特征向量的情况下,算法的计算复杂性为。(。’:’),并且该 算法的计算时间绝大多数情况都少于前一种分治算法。结合多分法,算法具 有很好的并行性,在分布式环境下,获得了很高的并行加速比和效率。 iii)提出了一种基于分治策略的并行同伦连续方法。通过限制步长、抛弃特征值 束的方法,提高算法效率。在矩阵规模较大时、算法的计算时间明显比 LAPACK少。在算法的并行实现时,充分利用计算与通信的重叠,在分布式 环境下获得了很高的并行加速比和稳定的效率。 (3)提出了一种计算实对称定矩阵广义特征值问题的并行分治算法。该算法是在计 算矩阵B的谱分解和MDR约化的基础上,将问题转化为对称三对角一正定对角 矩阵广义特征值问题,然后利用分治算法结合割线法迭代进行计算。根据扰动 理论,给出了每个特征值的包含区间。数值实验表明,在分布式环境下,该算 法可以获得很高的并行加速比和效率。 (4)提出了一种计算非对称矩阵广义特征值问题的分治算法。该算法在分治策略的 基础上,利用Laguerre迭代计算广义特征值,并采用各种方法尽可能减少特征 值的遗漏。采用收缩方法计算遗漏的特征值。数值实验表明,该算法计算速度 快,并行效果好。 (5)在分布式环境下实现了本文提出的所有算法,并形成了一个符合规范的软件包。

【Abstract】 Generalized eigenvalue problem of matrix pair is an active research task. It plays a very important role in many application, according to the point of mathematics point, its mostly application originate from equations of mathematical physics, difference equations, Markov process, and so on, its purpose is to solve the problems of solid, fluid, electromagnetic, microscopic particles, system control, and etc. In practical science research and engineer applications, such as, architecture project, research of aeronautics and astronautics, bioscience, computing physics and oil reconnoiter, many large scale generalized eigenvalue problems need to be solved.Massively Parallel Processing system (MPP) and PC cluster provide distributed-memory environments for parallel solving the generalized eigenvalue problem. This thesis investigates parallel solving the generalized eigenvalue problem Ax- Bx deeply, and proposes some new algorithms. The main productions are summarized as follows:(1) The symmetric-definite tridiagonal generalized eigenvalue problemsi) A divide-and-conquer algorithm is proposed. The original problem is divided into two subproblems, and the sum of the subproblem’s scales is equal to the original problem’s scale. The generalized eigenvalues are computed by secant iteration. The advantage of this algorithm is that its computing quantity and computing time are less than the divide-and-conquer algorithm of K.Y. Li. With good parallelism in nature, it is quite suitable for parallel processing. Theoretical analysis and numerical experiments show that this algorithm is faster than the divide-and-conquer of K.Y. Li. About 10%-20% computing time is lessen when the problem’s scale is large enough. The speedup satisfy: Sp - p as n - .Good linear speedup and stable efficiency can be achieved under distributed-memory environments.ii) A divide-and-conquer algorithm /with new dividing strategy is proposed. In this new dividing strategy, the sum of the subproblem’s scales is equal to the original problem’s scale minus 1. An eigenvalue interlacing theorem is given and proved. The interval including every eigenvalue provided by this dividing strategy is better than that provided by the first dividing strategy, with same iterative method, this algorithm has the advantage of less computing quantity and time. About 30% or more computing time is lessen when the problem’s scale is large enough. Thisalgorithm has good parallelism in nature, the speedup satisfy: Sp - p as n - .Good linear speedup and stable efficiency can be achieved under distributed-memory environments. (2) The symmetric-definite band generalized eigenvalue problemsi) A parallel divide-and-conquer algorithm combined with multisection method is proposed, the sum of the subproblem’s scales is equal to the original problem’s scale. An eigenvalue interlacing theorem is given and proved. Each eigenpair is computed by bisection and generalized Rayleigh quotient iteration. For computing all eigenvalues and eigenvectors is O(n2r2), it is less than O(n3), which is the computational complexity of LAPACK, when r n. This divide-and-conquer algorithm is combined with multisection method when it’s implemented under distributed environment, high speedup and efficiency can be achieved, ii) A parallel divide-and-conquer algorithm combined with multisection method is proposed, the sum of the subproblem’s scales is equal to the original problem’s scale minus r, where r is semi-bandwidth. For computing all eigenvalues and eigenvectors is O(玍2). The computing time of this algorithm mostly is less than that of the former algorithm. Combined with multisection method, this divide-and-conquer algorithm has high parallelism, and high speedup and efficiency can be achieved under distributed environments.iii) A parallel homotopy continuation algorithm based on divide-and-conquer is proposed. Efficiency of algorithm is improved by restricting step size and giving up eigenvalue’s clusters, the computing ti

节点文献中: 

本文链接的文献网络图示:

本文的引文网络