节点文献

摄影测量数据GPU并行处理若干关键技术研究

Study on Parallel Processing Technologies of Photogrammetry Data Based on GPU

【作者】 杨靖宇

【导师】 张永生;

【作者基本信息】 解放军信息工程大学 , 摄影测量与遥感, 2011, 博士

【摘要】 本文针对海量摄影测量影像数据快速处理的迫切要求,以GPU为核心部件搭建紧凑型GPU集群计算平台,并基于CUDA开发环境,研究摄影测量数据处理算法的GPU并行处理技术。论文完成的主要工作与创新点如下:1.简要总结了并行计算平台的发展历史及趋势,归纳了处理任务并行化的方法和遥感影像并行处理的基本模式;详细论述了GPU的硬件体系架构、软件编程模型、性能分析模型及性能优化的基本原则与常用策略;给出了论文研究所采用的两个实验平台。2.建立了云雾遮挡条件下降质图像的成像辐射模型,引入“暗原色先验(Dark Channel Prior)”知识作为约束条件,实现了单幅影像云雾去除技术;针对大气透射率内插优化复杂耗时的弊端,提出了一种基于积分图像盒式滤波的大气透射率边缘保持内插GPU细粒度并行计算方法,实现了大幅面遥感影像云雾快速去除预处理。3.提出了一种基于CUDA的遥感影像几何纠正GPU-CPU协同处理方法,实现了重采样操作的GPU细粒度并行,并根据算法的执行特点和GPU的并行架构,采用合理的任务划分与执行配置方案提高GPU线程的Warp占有率;利用共享存储器和纹理存储器分别对坐标变换系数和原始影像数据的访问模式进行优化,隐藏全局存储器的访问延迟,充分发挥GPU的并行计算优势。4.在总结分析真正射影像制作流程和现有遮挡检测方法的基础上,提出了一种基于阴影测试技术的遮挡区域快速检测方法,并利用三维图形绘制流水线的模板缓存和深度缓存实现遮挡检测的GPU硬件加速。5.提出了一种基于CUDA的SURF特征匹配GPU细粒度并行处理方法,通过合理的GPU线程组织方案及其原子运算来保证特征点提取的正确性;针对特征点匹配、误匹配点对滤除和相对方位元素求解等过程中的密集矩阵运算,采用分块计算模式来隐藏全局存储器的访问延迟,提高GPU的并行加速效率。6.提出了一种基于CUDA的灰度密集匹配GPU细粒度并行处理方法,重点研究了匹配代价计算的数据访问优化策略、相关系数匹配的并行粒度划分方案和半全局匹配的代价聚合GPU线程组织方案,并进一步探讨了多基线匹配的实现模式及其并行处理方法。7.设计了一种适用于摄影测量数据整体处理任务的松耦合GPU集群系统架构,通过合理的任务划分方案和积极有效的组织调度策略来开发集群节点之间的粗粒度数据并行性;利用缓存机制提高数据访问效率,进一步开发节点内部GPU与CPU之间的流水线并行性;通过大区域的面阵影像和线阵影像几何纠正试验,证明了GPU集群系统强大的数据处理能力,并分析了制约其性能进一步提升的瓶颈所在。

【Abstract】 Based on GPU Cluster Platform, this thesis makes a comprehensive study on the parallel processing technologies and methods of photogrammetry algorithms to meet the rapid processing requirements of massive photogrammetry data. Its aim is to resolve the key problems of GPU Cluster construction and application, and explore the technical methods and optimization strategies of photogrammetry data processing. The main work and innovation of this paper is as follows:1. Principle of parallel computing, development history and trend of parallel computer, parallelism between multi-processing tasks, and basic parallel processing modes of photogrammetry images are briefly analyzed and summarized. And the GPU’s hardware framework, software programming model, performance analysis model, optimization principle and basic strategies are discussed in detail, which provides the theoretical basis for fine granularity parallel processing through single GPU card. And moreover, two experimental platforms are given.2. Based on the degraded image radiation model in bad weather condition and Dark Channel Prior, a novel and effective haze removal method for single image is introduced, tested and analyzed. Aiming at the complex and time-consuming disadvantages of initialized atmosphere transmission’s interpolating and refining, a fast and edge-preserving interpolating method is proposed based on Guided Image Filter. Based on integral image and box filter, fine granularity parallel computing through single GPU card is realized.3. A fast GPU-CPU cooperate geometric rectification algorithm is presented based on CUDA, which realizes fine granularity parallel processing of re-sampling through a single GPU card. And on the basis of GPU’s hardware framework and software programming model, three performance optimization strategies are proposed to make full use of GPU’s high parallel computing advantages: using reasonable task partition and executing scheme to increase GPU threads’Warp occupancy; using high bandwidth shared memory optimization technology to reduce accessing times of coordinate transform coefficients in low-speed global memory; replacing global memory with texture memory to reduce the original image’s accessing time.4. Through the analysis of the True Ortho-photo generation flow and existing occluded area detecting methods, a fast occlusion detecting method based on shadow-testing technology is proposed. And based on Z-Pass algorithm, GPU hardware accelerating method is realized through Stencil Buffer and Depth Buffer of 3D pipeline to draw the occluded areas.5. After analyzing the SURF (Speeded-Up Robust Features) detecting, describing and matching principle, a corresponding fine granularity parallel processing method through a single GPU card is proposed. A reasonable thread organizing scheme and Aomic Calculation of GPU is used to ensure the correctness of detected SURF points. A partition computing pattern is used in intensive matrix computing of feature matching, false matches filtering and relative orientation elements computing to make full use of GPU’s parallel computing advantagethrough reducing accessing times to low bandwidth global memory.6. After analyzing the parallelism of dense gray level matching, a GPU fine granularity parallel processing method is proposed. And some optimization strategies are discussed in detail, which include accessing optimization of matching searching zone data, parallel processing granularity of correlate matching, GPU thread organizing scheme of cost aggregation step in semi-global matching. Moreover, multi-baseline matching mode and parallel processing method is further discussed.7. The logical framework, hardware and software configuration scheme and corresponding process flow of the loose couping GPU Cluster is designed. Based on reasonable coarse granularity task decomposing schemes and efficient task organizing and dispatching strategy, coarse granularity parallelism between multi GPU cards in GPU Cluster Platform is developed. And inside computing nodes, data buffer technology is used to improve data accessing efficiency and to develop stream parallelism between GPU and CPU. Massive data rapid processing ability of GPU Clsuter System is proved, and its bottleneck is analyzed through geo_rectification experimentes of large zone frame images and line array images.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络