节点文献

基于GPU的车身结构接触碰撞过程并行计算方法

GPU-based Parallel Computing Method for Contact and Impact Problems of Automotive Body

【作者】 蔡勇

【导师】 李光耀;

【作者基本信息】 湖南大学 , 车辆工程, 2013, 博士

【摘要】 汽车车身结构接触碰撞过程有限元计算是汽车CAE的重要组成部分,主要涉及汽车碰撞和车身覆盖件成形等工程问题分析,在力学上涉及到材料非线性、几何非线性和接触界面的边界非线性三类非线性问题,经常面临着数值计算量庞大,计算效率低的问题,因而实际应用中对并行计算的需求十分强烈。目前常见的有限元并行计算方法多采用区域分解等粗粒度并行策略,在以CPU为计算核心的网络计算机集群上运行,计算效率与计算机节点数直接相关,使用流程复杂且需要昂贵的硬件支持,因此这种并行计算方法的性价比不高。现代的图形处理器(GPU)是一种内部高度并行的众核处理器,浮点计算能力远高于同时期CPU的运算能力。可编程着色器的出现,使得GPU具有了通用处理器的特征,并开始应用于通用计算领域,为大数据处理和数值模拟研究带来了新思路和方法。最初的基于GPU的通用计算技术(GPGPU)采用Cg等高级着色语言编程,并已经应用于各类有限元计算,但是,由于这一时期的GPGPU技术只支持单精度计算,数据传输效率也不高,导致有限元GPU并行计算的精度低且效率提升有限,工程应用局限性大。统一计算架构(CUDA)的出现,带来了高效、直观的GPU并行程序开发工具,基于CUDA架构的GPU并行计算方法具有计算硬件成本低,计算程序开发简单等特点。本文以工程应用需求为指导,采用CUDA架构研究高精度和高效率的显式有限元细粒度并行计算方法,以及全流程细粒度执行的并行接触算法,最终实现在普通个人计算机上进行汽车车身碰撞仿真和薄板冲压成形仿真两类大规模非线性有限元的快速并行计算。本文的主要工作和成果如下:(1)考虑到非线性显式有限元天然的可并行性以及GPU的轻量级线程执行模式,开发了具有自主知识产权的基于GPU的显式有限元计算平台(发明专利受理号:201210266435.1)。其主要特点在于:建立了线程与单元、线程与节点、线程与自由度三种层次的抽象映射方法,使显式有限元计算与GPU线程完美融合。同基于网格分区的粗粒度有限元并行策略相比,该细粒度并行策略没有任何前处理过程,在单块显卡也不存在边界数据处理问题,能够大幅度提升计算效率。因此,可以很方便的实现节点速度、位移计算等显式有限元绝大部分流程在GPU上的高效并行计算。(2)针对单元计算中节点应力组装在GPU平台上难以并行化的技术瓶颈,提出了预索引并行应力组装策略,实现了BT四边形单元和EST三角形单元两种壳单元在GPU上的细粒度并行。提出了GPU上基于并行缩减算法的时间步长等单值并行求解方法。实现了显式有限元算法在GPU上的全过程计算,减少了GPU与CPU间数据交换的同时,使程序的计算效率达到最佳化。通过对板壳非线性问题计算表明,该算法的GPU并行计算结果与原串行算法在CPU中计算的结果完全一致,与同时期同价格的CPU相比,计算效率有明显的提升。在GTX580显卡上采用EST单元进行185万个自由度的弹塑性大变形问题求解时,可以达到近37倍的计算加速比。(3)接触碰撞有限元分析中,接触算法需占用70%以上的计算时间,为此,本文提出了包含并行级域接触搜寻算法、并行防御节点接触力计算方法和并行罚函数接触力计算方法在内的全流程GPU执行的细粒度并行接触算法。级域算法是一种适用于复杂自接触问题的高效搜寻算法,其同一级内接触块的计算独立性也符合GPU细粒度计算的要求。本文提出了线程与接触块一一映射策略、GPU并行排序以及提升GPU线程计算粒度等技术手段,实现了测试对在GPU上的并行搜寻。在接触对搜寻阶段,本文提出了线程与测试对间的映射策略以实现同一级内接触对的并行搜寻,并采用计算后排序的策略进行上一级与下一级间的数据交换。在接触力计算阶段,本文采用线程与接触对间的映射策略给出了穿透量和接触力细粒度并行计算方法,并采用原子操作来实现接触力的离散。最后,基于自主开发的碰撞仿真软件DYSI3D开发了基于GPU的碰撞过程计算机仿真并行计算软件CPS-GPU(软件著作权编号:2011SR001966)。采用该软件在GTX580显卡上进行177万个自由度的白车身碰撞计算时,可以取得20倍左右的计算加速比。(4)本文提出了完整的薄板冲压成形GPU并行计算方法。针对薄板冲压成形对材料流动模拟要求高的有限元计算特征,提出了包含复杂材料本构计算的单元GPU并行计算技术以及考虑摩擦的接触力GPU并行计算方法。本文提出了一体化接触搜寻算法在GPU上的计算策略:引入了计算机图形学中用于实时碰撞检测的广域搜寻方法来完成测试对搜寻,并在建立了相邻接触块信息的前提下,给出了接触后搜寻中接触对细粒度并行更新方法。在自主开发的薄板成形仿真软件CADEMII软件的基础上,开发了基于GPU的板料成形并行计算软件CADEM-GPU(软件著作权编号:2010SR052426),并加入异步数据输出模式以及基于OpenGL的实时显示技术,进一步提高了软件的计算效率和实用性。数值算例表明,该软件具有较高的计算精度和计算效率,在GTX460显卡上,对于数万网格数的仿真模型,可以取得20倍以上的加速比,有效缩短了仿真计算时间。

【Abstract】 Finite element (FE) simulation of contact and impact process is an important partof the automotive CAE technology. It is widely applied to engineering problems, suchas car crash simulation and sheet metal forming simulation. This kind of simulationusually involves material nonlinearity, geometric nonlinearity and nonlinear boundaryconditions. Due to these three kinds of nonlinearity, the FE analysis of contact andimpact problems faced with enormous computations and low computing efficiency.Therefore, there is a very strong demand for parallel computing in practicalapplications. Nowadays, the most common parallel computing methods are based onthe coarse-grained parallel domain decomposition strategy, and use CPU-basedcomputer network as the computing hardware. In these traditional parallel computingmethods, computation efficiency is directly related to the number of computing nodes.Furthermore, in practice, more complex programming and expensive hardware arerequired for more computing nodes. Therefore, they are not cost effective for bothindividual and business.Modern graphics processor unit (GPU) has developed into a kind of multi-coreprocessors with highly internal parallelism, and its float point processing ability ismuch higher than CPUs at the same period. In the meantime, the appearance ofprogrammable shaders brings several general computing characteristics for GPU.Nowadays, general-purpose computing on GPU (GPGPU) becomes to a novel andeffective methods for general large data processing and numerical simulations. Theearly GPGPU needed to use high-level shading languages to code, such as Cg. Severalresearchers have tried to use early GPGPU to improve computing efficiency, but theseGPU-based FE codes cannot meet the demands of requirements in accuracy andefficiency. This is mainly due to the limited of double float support and the datatransfer efficiency. Later, an efficient and intuitive GPGPU program developmenttools named compute unified device architecture (CUDA) is presented by NVIDIA.CUDA brings an efficient way for GPGPU with low computing cost and generalprogramming language.In this paper, a GPU-based parallel strategy for explicit FE computing with a fullfine-grain parallel contact algorithm is presented to meet the demands of engineeringapplications. And, the high performance parallel computing of automotive body crashsimulation and sheet forming simulation on normal personal computer with a CUDA-capable device are realized. The main research content and result are asfollows:(1) A GPU-based parallel explicit FE computing platform with independentintellectual property rights based on the characteristics of explicit scheme andlightweight threads parallel computing model of GPU is presented (Patent PendingNumber:201210266435.1). The main advantage of this platform is constructed threekinds of one-to-one mapping relationship between CUDA thread and computingobject, including thread-to-element, thread-to-node and thread-to-freedom. Compareto the coarse-grained parallel FE algorithm based on grid partition technology, thefine-grained parallel strategy can enhance calculation efficiency without anypre-treatment processes and boundary data processes. Therefore, the most parts ofexplicit FE calculation processes involving nodal speed computing and displacementcomputing can mapped to GPU computing to achieve high efficient.(2) The nodal force assembling on fine-grained parallel platform has long been adifficult subject. This paper proposed a pre-index strategy to realized parallelassembling on GPU with few additional works. In the meantime, parallel strategiesfor two kinds of shell element including Belytschlko-Tsay (BT) shell element andEdged-based smoothed triangular (EST) shell element are presented based on theabove parallel computing platform. Parallel reduction method is introduced tocalculate all kinds of single variables, such as global time step. Finally, an entireparallelized explicit FE iterative process based on GPU is proposed, which can obtainan optimal computational efficiency by reduce the data transfers between CPU andGPU. The numerical examples for nonlinear shell structures show that this methodcan greatly improve the computational efficiency with the same computing results ofserial computing on CPU. For example, about37times speedup obtained by GTX580GPU compare to I7CPU for an elastic-plastic large deformation problem with18.5million degrees of freedom.(3) During a FE analysis of contact problem, the time consumption of contactalgorithm usually occupies more than70%of the total computation time. Therefore,an entire GPU-based parallel contact algorithm is proposed in this paper, includingparallel hierarchy-territory contact-searching algorithm (HITA) and two kinds ofparallel contact force calculation algorithms involve parallel penalty function methodparallel defense node algorithm. HITA is an efficient contact-searching algorithm andespecially suitable for complex problems contain self-contact phenomenon.Furthermore, the computing independence of contact segments searching in the same hierarchy is suited for GPU parallel computing. Firstly, this paper proposed severaltechnical means to realize the parallel search of test pair on GPU, including thread tosegment mapping scheme, the GPU-based sort method and the technology of improvethe size of thread granularity. Secondly, in contact pair searching phase, a mappingrelationship between thread and test pair is presented to achieve the parallel searchingin the same hierarchy. And, a store strategy based on sort is used to realize efficientdata transfer between higher-level hierarchies and lower-level hierarchies. In thecontact force calculation phase, fine-grained parallel strategy based on thread tocontact pair mapping is present to parallel computing contact force, and atomicoperation is used to contact force scatter. Based on the above mentioned algorithms, aGPU-based contact process simulation software named CPS-GPU (SoftwareRegistered Number:2011SR001966) is developed based on the self-developed serialcontact process simulation software DYSI3D. The numerical examples alsodemonstrate that this software can get highly accuracy and efficiency. For example,about20times speedup can obtain by using GTX580graphics card to calculate aBody in White (BIW) crash model with17million degrees of freedom.(4) This paper presents a complete GPU parallel computing method to acceleratethe FE analysis of sheet metal forming process. According to the requirement of highcomputing accuracy for material flow in sheet forming simulation, a parallelcomputing method for shell element with complex material constitutive andfriction-considering contact force computation are proposed. In the meantime, theway to parallel a simple contact algorithm integrated in the self-developed sheet metalforming simulation software CADEMII is studied. Firstly, the wide broad searchmethod used in real-time collision detection is introduction to test pair searchingduring the pre-contact searching. Secondly, a parallel contact pair update method afterpre-contact searching is proposed based on the information of adjacent contactsegments. Finally, a GPU-based sheet metal forming parallel computing softwarenamed CADEM-GPU (Software Registered Number:2010SR052426) is developedbased on CADEMII. To extend the computing efficiency and practicability of thissoftware, several usefully technologies such as data asynchronous transfer methodand real-time display technology based on OpenGL are added. Numerical examplesshow that more than20times speedup can be obtained by using GTX460graphics tocalculate sheet metal FE model with tens of thousands of elements.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2014年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络