节点文献

面向CFD的交互式并行化技术研究

Research on Interactive Parallelization for CFD

【作者】 丁晓宁

【导师】 朱怡安;

【作者基本信息】 西北工业大学 , 计算机应用, 2002, 博士

【摘要】 程序自动并行化技术一直是并行处理领域的研究热点与难点,目前虽然已取得了长足进步,但实际应用效果还不理想。我们以计算流体力学CFD为应用领域对程序自动并行化技术进行了长期研究,针对银河-Ⅲ巨型机开发的程序自动并行化系统NPUPAR取得了较好的应用效果。在此基础上,国防科技大学并行与分布处理国家重点实验室建议我们扩充交互功能,进一步提高并行效率。本论文就是围绕国防重点实验室基金项目《面向计算流体力学的交互式并行化技术研究》(编号99JS94.6.1.HK0313)展开的。 本文的主要工作和贡献如下: 1、提出并建立了CFD程序并行化的区域计算模型。同传统的程序模型以及CFD程序的帧迭代模型和场循环模型相比,区域计算模型能够更好地反映CFD程序的结构特点,使得对CFD程序的深入分析成为可能。不但有利于保证并行程序正确性,而且有利于开发并行性,减少通信与同步,提高并行程序效率。 2、在传统相关性理论的基础上提出了区域相关(包括与循环无关的区域相关和跨循环的区域相关)的概念,研究了区域相关的测试算法。区域相关以区域操作为基本单位。区域操作本身所具有的对大块数据进行整体操作的特点,使得区域相关非常适合开发CFD程序中蕴含的数据并行性;同时,区域操作本身的灵活性又使自动并行化系统不但可以在全局范围内进行整体分析,而且可以深入到帧迭代和场循环内部进行较为细致地分析,进而生成更高质量的并行程序。 3、[Wolfe96]提出了使用FUD链识别分析递归变量的方法。本论文在区域和区域相关概念的基础上根据CFD程序的特点对FUD链予以扩充,提出了识别分析归约和归约变量的EFUD方法。使用EFUD方法不但可以识别复杂的标量归约,而且可以识别数组归约。 4、研究了基于区域相关的通信判定方法,给出了判定通信的条件以及确定通信数据范围(通信区域)的公式,并以此为基础提出了基于通信令牌的优化策略。通信令牌是包含通信区域、通信的方向等信息的多元组。使用通信令牌便于通过交互进行手工优化。基于通信令牌的优化策略通过合并通信令牌降低通信量,减少通信的次数;通过寻找通信令牌最多的语句对象确定通信语句的位置以降低同步次数。摘要 5、为了提高系统交互效率,提出了程序对象树结构。使用程序对象树结构组织并行化系统的内部数据和程序代码,便于交互时信息的定位与综合,并且有利于实现增量分析。 本论文成功地实现了交互式并行化系统Paractive。经国防科技大学并行与分布处理国家重点实验室提供的CFD实际算例测试,Paractive生成的并行程序在银河-Ill并行计算机上4机并行效率达到80%,8机并行效率达到70%。q

【Abstract】 As a hot research topic in parallel computing area, automatic parallelization is attracting more and more researchers. Much progress in exploiting coarse-grain parallelism has been made in recent years, but application results are still disappointing, with many programs achieving little or no speedup while executed in parallel. Our team has accumulated much experience after several years of intensive research on the automatic parallelization for CFD(Computational Fluid Dynamics). The automatic parallelizer NPUPAR we developed for YH-III super computer has shown promising result. Considering our achievements, National Laboratory of Parallel and Distributed Processing(PDL) at National University of Defense Technology recommended that we carry out the research on interactive parallelization to get better preference. This paper is just supported by the Defense Science Foundation to carry out the research on the interactive parallelization for CFD.The following is the achievements of the paper:1. The paper proposes a Domain-Computing Model for the parallelization of CFD programs. This model fits structural features of CFD programs better comparing with existing ones, including Frame-Iteration Model and Field-Loop Model that we proposed earlier. The Model enables the in-depth analysis, which helps to ensure the correctness and to improve the efficiency of the resultant parallel programs.2. The paper introduces domain dependency for domain operations. Domain operation is a kind of uniform operation over a data set, which contains data parallelism. This character of domain operation makes domain dependency suitable to the parallelization of CFD programs. Meanwhile, taking advantage of the flexibility of domain operations, parallelizer can carry out not only global analysis on frame-iteration scale and field-loop scale, but also in-depth and detail analysis on various smaller scales. This helps the parallelizer to generate more efficient parallel programs.3. [Wolfe96] proposed a method that recognize induction variables using FUD chains. This paper presents a new method, called EFUD method, to recognizeHIreduction and reduction variables. This method reconstructs FUD chains using the concepts of domain and domain dependency. The new graphs after reconstruction is called EFUD chain. EFUD method is just tailored to the character of CFD program, whose dependencies are generally uniform. Even array reduction, as well as complex scalar reduction, can be recognized by this method.4. The paper studies the way to find communications that should appear in **parallel programs based on domain dependency. Conditions of communication and formulae for computing the region of those data that require communication are given.Then the paper present a strategy to optimize communication using communication tokens. Communication token is a tuple including communication domain, communication direction, and so on. Here, communication domain is the term to represent above data area. Communications can be optimized manually in interactive mode more easily. The strategy reduces the amount and times of communication by merging the tokens, and minimizes the number of synchronizations by placing communications beside the statement which possesses most tokens.5. In order to make interaction more productive, the paper constructs a tree framework to organize data used by parallelizer and corresponding analyzing programs. We call this kind of framework as object tree. Object tree simplifies the procedures of data location, integration, and conversion. Moreover, the tree framework facilitates incremental analysis in order to shorten parallelizer ’s responding time.We developed a parallelizer Paractive to prove our thoughts, and tested Paractive using CFD programs from PDL. Parallel programs obtained can achieve 80% when executed on 4 computing nodes, and 70% on 8 nodes of YH-III super computer.

  • 【分类号】TP311.52
  • 【下载频次】275
节点文献中: 

本文链接的文献网络图示:

本文的引文网络