节点文献

基于Handel-C的CCSDS图像压缩算法实现研究

Research on Hardware Implementation of CCSDS Image Compression Using Handel-C Language

【作者】 滕学剑

【导师】 陈晓敏;

【作者基本信息】 中国科学院研究生院(空间科学与应用研究中心) , 计算机应用技术, 2011, 博士

【摘要】 伴随着航天遥感技术的迅猛发展,采集空间图像的需求也日益增长。为了解决有限的卫星通信信道容量和海量数据下传的矛盾,应用卫星数传系统在图像数据下传之前必须采用数据压缩技术,这就需要对图象编码理论和经典编码算法进行深入研究分析,因此研究性能优越、易于硬件实现的、适于航天环境的图像编码算法及VLSI实现具有重要意义。2005年11月发布的CCSDS图像压缩标准是一种基于小波变换的图像专用压缩算法,该算法以其优良的图像压缩效率和针对空间数据的高可靠策略,在空间图像数据压缩领域具有广阔的应用前景。在论文中,我们主要对CCSDS图像压缩算法的重要模块的VLSI实现方法进行了研究。研究的主要内容包括:(a)一维和二维9/7整数小波变换的VLSI结构设计;(b)位平面编码的VLSI结构设计;(c)压缩算法的Handel-C设计。论文第二章研究了CCSDS图像压缩算法的基本组成部分,分析了算法与JPEG2000和SPIHT两种算法的复杂度及性能的区别,得出结论:CCSDS算法可以在性能与硬件实现复杂度上取得很好的平衡,有利于满足高效的深空探测及近地观测应用。论文第三章的DWT模块部分,分别就一维小波变换、二维小波变换的结构设计进行了研究。针对9/7小波变换直接映射结构具有较长关键路径的缺点,依据流水线优化原则,构造了一种新的基于流水线优化的提升方案,大大减少了关键路径延时。在1D-DWT设计的基础上,发展了一种有效的二维小波变换结构,该结构实现图像行变换和列变换的流水线并行执行,接下来在2D-DWT的基础上,设计了一种支持空间图像线扫式输入的级间流水的三级变换结构。论文第三章的位平面编码设计部分,首先给出了BPE总体设计架构,在预处理阶段读取每个小波系数时,就计算出系数深度信息并保存,避免了后期位平面扫描阶段每个平面重复读取小波系数的时间,提高了编码效率;随后给出DC量化系数和AC系数块深度Rice编码的电路设计单元;接下来的位平面扫描模块设计中,设计了一种16个小波系数块并行扫描的扫描结构,采用小波系数深度替代重复读取小波系数的方式改进了系数类型字的计算方式;对扫描信息的存储结构提出了2点优化措施,加快了熵编码器的编码速度;最后码流拼接模块设计中,提出了一种有效的无冗余位码流拼接结构。论文第四章阐述了算法的Handel-C设计,首先给出了Handel-C语言和传统C语言的比较,接下来分析了基本语句的Handel-C实现电路的结构;设计了Handel-C代码的软件总体结构以及转化C语言模块为Handel-C语言模块时,需要遵循的若干原则;针对软件模块跨时钟域的相互连接的问题提出了一种解决方法。最后在论文第五章FPGA芯片验证阶段,利用ML555开发板搭建整个验证系统,对该系统进行大量的硬件测试,验证了本文CCSDS算法硬件实现结构的有效性。本论文工作的主要创新之处在于:(1)构造了一种新的基于流水线优化的提升方案,针对9/7M小波直接映射结构具有较长关键路径的缺点,依据电路数据通道与前向分割集交点插入流水线寄存器的原则,加入4级流水线,相对于优化前的电路结构而言,其关键路径长度和组合逻辑深度大大变浅,仿真结果表明采用该优化方案,电路最高运行频率提高约4倍,而硬件资源开销仅增加了大约50%。(2)位平面扫描模块设计中,针对扫描信息的存储结构提出了2点优化措施:其一是Stage4阶段小波系数位平面扫描比特字串并转换后存入对应存储空间,降低了存储要求,其二是存储内容由转义类型字改为转义类型字对应的编码符号,避免了熵编码对多个固定无效标志的访问,加快了熵编码器的编码速度。(3)码流拼接模块设计中,提出了一种有效的无冗余位码流拼接结构,能够实现在一个时钟周期内完成1~8位二进制比特位的无冗余连接。

【Abstract】 Together with the development of space exploration technology, the demand for collecting data of space image is getting higher and higher. Due to the confliction between the limited communication bandwidth and the mass image data being downloaded, high speed image compression technology must be taken before onboard image data transmitted, which in turn calls for research on theory of image compression and leading compression algorithm. Therefore it is meaningful to research on the focus of VLSI hardware architecture for those compression algorithms, which are easily hardware implementation and are suitable for outer space environment.Published on Nov. 2005, CCSDS image compression algorithm is a leading algorithm oriented to space image. It is based on the Discrete Wavelet Transform (DWT). Due to it’s well compression performance and high reliability, CCSDS image algorithm has extensive foreground in the field of space image compression. The main research of the thesis is on the focus of hardware architecture for onboard image compression, containing three part: (a) hardware architecture for 1D-DWT and 2D-DWT. (b) hardware architecture for Bit Pane Encoding(BPE). (c)the design of CCSDS algorithm using Handel-C language.In chapter 2, the main body of the CCSDS algorithm is introduced; Its performance and hardware complexity is analyzed by comparing CCSDS with JPEG2000 and SPIHT. It is concluded that CCSDS is suitable for deep space exploration and earth-approaching observation.Hardware architecture for 1D-DWT and 2D-DWT are discussed in chapter 3. Due to the longest path in the direct mapped lifting architecture of 9/7M DWT, it is impossible to get higher running frequency. According to the pipeline design rule, a new method based on 4-leveled pipeline is designed, which minimizes the longest path and significantly improves the timing performance. Based on 1D-DWT, a 3-leveled 2D-DWT is designed, in each level row-DWT and column-DWT is processed by parallel and pipeline methods. .In the BPE part of chapter 3, during preprocess stage, it minimizes the wavelet coefficient memory access time by using coefficient depth instead of coefficient itself, the coefficient depth will be used for computing coefficient type. In the stage of Bit Plane Scanning, 16 coefficient blocks are scanned at one time, which accelerates the subsequent AC entropy process. An optimized memory structure is proposed to speedup AC entropy process. In the Byte Builder module design, a valid architecture is presented to linkup the variable length codeword without any redundancy.In chapter 4 the Handel-C design process is discussed. After comparing the traditional C language with Handel-C language, the“sequent mechanism”is analyzed, then introduced some rules on rewriting C code to Handel-C code. Finally, a method is proposed to resolve the problem of connecting different clock domain modules.The main achievements of the thesis can be summarized as follows.:1. An optimized lifting scheme of 9/7M DWT based on 4-leveled pipeline was presented. Due to the longest path in the direct mapped lifting architecture of 9/7M DWT, it is impossible to get higher running frequency. According to the pipeline rule of inserting pipeline register in the forward cutting set, a new method base on 4-leveled pipeline is presented, which minimizes the longest path and significantly improves the timing performance.2. Two improvements on the memory structure of BPE scanning information were proposed. During the stage 4, the wavelet coefficient scanned bits can be de-seriated before save to memory location; The mapped symbol of the scanning word is saved to the memory instead of the scanning word itself, Because the mapped symbol has not“null”symbol, the AC Entropy Encoder would not access to invalid memory location. As a result the speed of AC Entropy can be enhanced.3. A valid architecture was presented to linkup the variable length codeword without any redundancy. 1-8 bits can be compactly jointed together during one clock.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络