

Research of Programming Analysis and Parallelism Based on Graphics Processing Unit

【作者】 王涛

【导师】 姚远;

【作者基本信息】 解放军信息工程大学 , 计算机软件与理论, 2010, 硕士

【摘要】 高性能计算机是一个国家经济和科技实力的综合体现,也是促进经济、科技发展,社会进步和国防安全的重要工具,已成为世界各国竞相争夺的战略制高点。在人们追求高性价比的并行计算机系统的同时,在许多专用领域的专用计算部件也发挥着其强大的并行计算能力。图形处理器(GPU,Graphics Processing Unit)就是一种用于通用计算的专用加速部件。随着微电子技术的发展,图形处理器,无论是在集成度还是在数据处理能力上都已远远超过通用处理器,特别是在可编程能力、并行处理能力和应用范围方面得到不断提升和扩展,成为当前计算机系统中具备高性能处理能力的部件。目前,国内外针基于GPU的并行化研究,一般都是在原有串行程序的基础上,由熟悉GPU硬件结构的计算机专业人员进行程序改写。但由于串行程序并行化后带来的各种开销,使得并行化后的执行效率可能不及串行程序的执行效率。因此,如何合理地对串行程序进行分析,评估串行程序并行化后在GPU上的执行效率变得尤为重要。本文针对如何评估串行程序并行化后在GPU上的执行效率展开研究,主要研究内容如下:一、研究支持CUDA架构的GPU多线程硬件体系结构以及编程模型。在分析目前高性能计算和GPU通用计算的现状的基础上,详细阐述了GPU在通用计算中的优势,对图形处理器的硬件结构以及编程模型进行深入研究,为开销模型建立提供理论基础。二、为实现循环体工作量的精确计算,本文在深入研究传统的数据依赖关系分析方法的基础上,针对SUIF无法准确计算循环体上下界不固定时的迭代次数的情况,提出了一种改进的方法。三、为了预测串行程序并行化后在GPU上的执行效率,提出了一种基于CUDA架构的GPU并行开销模型,该模型综合考虑了程序并行化的各种开销(设备启动开销、数据传输开销以及GPU执行开销)。通过该模型可以预测出串行程序用GPU加速时的时间开销,将其与串行执行的开销进行对比,从而判断是否用于GPU加速,进而指导串行程序的并行化。

【Abstract】 High performance computer is not only the integrated expression of a country’s economic and technological strength, but also an important tool for economic promotion, technology development, social progress and national security. It has become the strategic high ground. While people pursue the cost-effective parallel super-computer system, some dedicated computing components play their powerful parallel computing power in many special areas, Graphics Processing Unit, GPU, is one of them for image processing and general purpose computation. With the development of microelectronics technology, GPU is far better than general-purpose processor in integration and data processing capabilities. And GPU has become the component of high performance computer systems.At present, the research for GPU parallelism mainly based on the original serial program, and the professional, who is familiar with the GPU architecture, transforms the serial into parallel. But due to the various costs brought by the parallel implementation, the efficiency of the parallel program is less than that of serial program. This is undoubtedly a great waste of manpower and financial resources. Therefore, how to analyse the serial program reasonably and to predict the efficiency of parallel program on GPU becomes particularly important. This thesis studies how to make GPU more reasonable and effective in general purpose computation. The main research contents and innovations are as follows:1. The thesis analyses the current status of high performance computing, points out the difficulties and challenges which the traditional high performance computers are facing from different views, and studies the hardware architecture of GPU and the programming model, which will be the theoretic foundation of the following cost model.2. The thesis studies the data dependent relation technologies, and adopts an improved method to accurate the number of iteration for calculating loop body workload, which SUIF cannot do when the upper bound and the lower bound of loop body are not certain.3. In order to predict the execution efficiency of parallel program on GPU, the thesis presents a cost model for GPU based on CUDA architecture. The model takes into account several factors including the cost of data transfer, the cost of device startup and the cost of GPU execution. The model can estimate the total time cost of parallel program on GPU, which can determine whether it is worthy for GPU acceleration.


