节点文献

面向遥感图像数据处理层应用的算法加速器体系结构研究

Architecture of Accelerators for Remote Sensing Image Processing Algorithms

【作者】 李宝峰

【导师】 周兴铭; 窦勇;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2009, 博士

【摘要】 现代遥感技术正向着高光谱分辨率、高时间分辨率和高空间分辨率的方向迅猛发展,随之而来的是遥感数据量的急剧膨胀,从而给遥感图像处理带来了极大挑战。为满足实时处理海量遥感数据的需求,越来越多的图像处理算法正由传统的地面计算逐渐向星载计算转移,首当其冲的就是各种数据处理层算法。由于小波变换具有在时、频两域同时表征信号局部细节信息的能力和多分辨率分析特性,因此各种基于小波的算法成为遥感图像数据层处理的主流算法。此外,由于星载计算要求体积小、功耗低、计算能力强的处理方式,因此研究面向各种基于小波的遥感图像数据处理层算法的专用硬件加速器体系结构技术具有重要的前瞻性和巨大的应用价值。本文首先从功能分类的角度对目前各种基于小波的主流遥感图像数据处理层算法进行概括总结,根据这些算法在计算特性和访存需求上的差异,将其大致归为三类:规则窗口访问型算法(含点访问型算法)、位访问型算法和不规则窗口访问型算法,并对各类算法的计算特点和访存需求进行了深入分析。鉴于小波变换在本课题研究中的基础地位,特别针对二维小波提升变换算法提出了一种优化的2×2阵列结构。该阵列结构由两个行处理单元和两个列处理单元构成,行处理单元并行处理不同行上的变换操作,列处理单元则并行处理不同列上的变换操作,只要行处理单元积累了足够的列变换数据,即可启动列变换处理,从而充分利用算法中存在的不同行、不同列以及行、列变换之间的并行性,有效提高系统执行速度。并且2×2结构与二维小波提升算法的计算特性最为吻合,使得行处理单元所产生的数据能够及时被列处理单元消耗,从而有效减少了二者之间的中间缓存。实验结果表明:与现有最快结构相比,所提结构的处理速度提高了1.7倍,同时保持了适中的片上存储需求。针对现代存储器不能有效支持规则窗口访问型算法所要求的窗口访问模式问题,提出了一种可扩展的多数据窗口并行结构,其中采用了由外部存储器、片上多体存储器和流水寄存器组构成的三级存储结构,不仅能够有效支持窗口访问模式,并且能够充分利用多个数据窗口之间的数据重用性。此外,本文还对所提结构的可扩展性进行了深入讨论,从并行计算所需带宽与可用存储带宽平衡的角度出发,实现了一种高效的多数据窗口并行结构。本文以JPEG2000中的EBCOT算法为例,研究了位访问型算法的加速器结构。EBCOT算法由位平面编码和算术编码两个模块组成。本文首先针对传统位平面编码算法所导致的片上存储器访问模式失配问题,提出了一种基于子块的位平面编码存储优化调度方案,通过对编码块细分子块的方法巧妙避免了对片上编码块存储器的按位访问。基于子块的存储优化调度方案不仅能够有效解决访问模式失配问题,并且提供了一种新的子块并行方式。基于子块方案,本文实现了一个子块并行合并样本并行的位平面编码器结构。实验结果表明:与现有最快结构相比,所提结构的编码速度提高了30%以上,同时将片上存储容量需求减少了40%左右。进而本文还对几种可能的算术编码器结构进行分析和实现,提出了一种单符号编码三级流水结构。此外,还研究了位平面编码与算术编码之间的数据接口问题。最后,本文以基于小波的遥感图像全局自动配准算法为例对不规则窗口访问型算法加速器结构进行了深入研究。不规则窗口访问型算法中数据窗口运动的不规则性导致数据预取范围过大、片上存储需求过高,不利于硬件高效实现,因此,本文提出了一种分块重采样算法,采用对结果图像进行分块重采样的策略有效缩小了数据预取范围,减少了片上存储需求。基于该分块算法,实现了一种高效的遥感图像全局自动配准算法加速器结构——BWAGIR结构,其中采用了流水化、并行重采样与相关系数计算、并行存储访问以及多BWAGIR模块并行策略,实验结果表明:采用5个BWAGIR模块能够达到与并行机上采用30个结点实现的并行算法相当的性能。

【Abstract】 With the rapid development of remote sensing technology towards high spatialresolution, high frequency resolution and high spectral resolution, the size of remotesensing images grows significantly. To meet the requirement of processing massiveremote sensing data realtimely, more and more image processing algorithms are be-ing migrated from ground computing to on-board computing. The wavelet-basedremote image processing algorithms have become most popular because of the ex-cellent features of wavelet transform. And on-board computing requires low powerconsumption, small size and powerful computational capabilities. Therefore, it’s ofgreat significance to study on specific accelerators for kinds of remote sensing imageprocessing algorithms.Firstly, a comprehensive investigation is developed on popular remote sensingimage processing algorithms. And based on the di?erent features of memory ac-cess, they are classified into three categories - regular-window-access algorithms,bit-access algorithms and irregular-window-access algorithms. For each category,the computation and memory requirements are analyzed.An optimized 2×2 array archtecture for lifting-based two dimension discretewavelet transform (2D-DWT) is proposed specifically because wavelet transformis the foundation of this research. The proposal is composed of two row proces-sor (RP) which operate on di?erent rows and two column processor (CP) whichoperate on di?erent columns. It exploits the parallelisms between di?erent rowtransforms, between di?erent column transforms, and between row transform andcolumn transform to improve the excution speed. Because the array can coincidewith the computation of 2D-DWT, the data produced by RPs can be consumed byCPs in time. The bu?er between RPs and CPs are also reduced.For the algorithms with regular-window-access type, a scalable multi-data-windows-parallel architecture is proposed in which a three-level memory hierarchyis employed. Compared with other related works, our architecture can not onlysupport the window access pattern to memory, but also support the data reusabilitybetween multiple data windows. And also the scalability of proposed architecture isdicussed based on the balance of the computing requirements and available memory bandwidth.As a typical bit-access algorithm, the embedded block coding with optimaltruncation (EBCOT) algorithm in JPEG2000 standard is composed of bit-planeencoder (BPE) and arithmetic encoder (AE). A subblock-based BPE scheme is pro-posed firstly to conquer the mismatch in memory access caused by the traditionalscheme. And also, the new scheme makes it possible to encode multiple subblockparallelly. Based on the subblock-based scheme, a subblock-based BPE architectureis proposed in which subblock-parallel and sample-parallel policies are employed.The architecture not only improves the encoding speed, but also reduces the re-quirements for on-chip memory. For the AE part, a single-symbol coding three-stagepipeline architecture is proposed after an investigation of several popular pipelines.And also the interface between BPE and AE is discussed to balance the di?erencein speed.Finally, the wavelet-based automated global remote sensing image registration(WAGIR) is taken as an example to study the accelerator architecture for irregular-window-access algorithm. The resampling process and the computation of corre-lation coe?cient are the kernels of WAGIR. So the WAGIR is accelerated by ac-celerating the resampling and computation of correlation coe?cient. Traditionalresampling process demands great on-chip memories, and cannot be implemented inhardware e?ciently because of the irregular sliding of data window in the referenceimage. A block resampling algorithm is proposed to address this problem. In thenew algorithm, the result image is resampled block by block to reduce the scopeof required data. Based on this scheme, a BWAGIR (block WAGIR) architectureis proposed in which pipelining, parallel resampling and computation of correla-tion coe?cient, parallel memory access, and parallel multiple BWAGIR modulesare employed.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络