节点文献

锥束CT三维重建算法加速技术研究

Study on 3D Cone-Beam CT Reconstruction Algorithm Acceleration

【作者】 曹思远

【导师】 王珏;

【作者基本信息】 重庆大学 , 模式识别与智能系统, 2009, 硕士

【摘要】 采用平板探测器的锥束CT系统扫描一圈可以得到多层的投影数据,相对于二维平行、扇束CT,具有扫描时间短、空间分辨率高、射线利用效率高的显著特点,现已得到广泛的应用。Feldkamp、David和Kress提出的基于圆形扫描轨迹的实用近似重建算法(简称为FDK算法),目前是商用锥束CT机上最通用的算法。但是随着面阵探测器上探测单元数量越来越多,探测器扫描速度越来越快以及锥束CT重建算法的复杂性,使得三维图像重建的运算量和数据传输量越来越大,重建时间也越来越长,过去只利用CPU进行重建计算的方案已经不能满足现代工程应用的要求,因此研究如何提高锥束CT重建算法的运算速度并找到合适的方案具有重要的学术价值和应用研究价值。本文主要做了两个方面的研究,一是从重建算法的角度对锥束CT的图像重建加速理论进行研究;其二是研究利用图形处理器领域的统一计算设备架构技术来实现FDK算法的加速计算。在锥束CT重建算法的研究方面,本文对FDK算法进行了较为深入地研究,做了三个方面的工作。其一,对FDK算法的并行性原理进行分析,FDK算法的运算量大,但具有并行性,可以按转动分度和重建对象切片划分进行并行计算;其二,利用FPGA进行锥束CT图像重建,一直是工业CT领域的一个研究热点,本文根据FDK算法中的反投影计算过程,对反投影流水线计算架构进行了研究,发现该架构可以使反投影算法在低并行度条件下实现快速计算,在计算机上的仿真实验表明该架构在FPGA上是可以实现的;其三,研究了FDK算法中的反投影定点算法,并在计算机平台上进行了实验,实验结果表明定点算法相对于浮点算法的误差率小于1%。在硬件重建加速的应用研究方面,本文根据FDK算法的并行计算原理,提出了利用图形处理器领域中的统一计算设备架构技术来实现重建加速的方案。该方案采用了基于这种全新软硬件架构的图形显示卡,通过该架构特有的编程方式,利用图形处理器中的流处理器来进行FDK算法中的加权、滤波和反投影计算,实现了FDK算法的快速计算。实验结果表明,对于5123的单精度浮点数据格式的图像,在旋转一周为512个分度的条件下,重建时间可以缩短到一分钟以内,并且图形处理器显存与计算机内存之间传输时间小于1秒,与仅利用CPU的重建方法相比,该方案得到的重建加速比可达到250倍左右。

【Abstract】 Cone-beam CT system can acquire multi-layer projection data in a single scanning circle by use of flat-panel detector system. Comparing with two-dimensional parallel and fan-beam CT, this system which now is widely applied, has the feature of short scanning time, high spatial resolution and efficient use of radiation. FDK algorithm which is proposed by Feldkamp, David and Kress based on circular scanning track is a practical reconstruction algorithm. At present, it is the most common algorithm in the commercial cone-beam CT system. However, with the number increasing of detector units and the scanning speed accelerating in the flat-panel system, as well as the complexity of three-dimensional reconstruction algorithm, the computation and the volume of data transmission in three-dimensional image reconstruction becomes huger and the algorithm is more and more time-consuming. The only use of CPU for image reconstruction calculation in past has been unable to meet the requirements of modern engineering applications. Now studying how to improve the computational speed and finding a suitable alternative method have an important value in application and academic research.In this paper, the content consists of two aspects. On the one hand, it is theory study of the cone-beam CT image reconstruction speed-up from the point of algorithm. On the other hand, it is the application of Compute Devices Unified Architecture (CUDA) in graphics processors to achieve FDK algorithm accelerating.In the study of reconstruction algorithm in cone-beam CT, this paper discusses three aspects based on the theory of FDK algorithm. First, this paper discusses the principle of parallel computation in the FDK algorithm. Although the computation is time-consuming, the algorithm can be calculated in parallel divided by the rotation angle or the sections in reconstructed object. Second, because the use of FPGA for cone-beam CT image reconstruction has been a hot spot in the field, so this paper discusses the back-projection calculation pipeline architecture which can cause the algorithm fast computation under the condition of small degree of parallel according to the back-projection calculation step. The simulation results in the computer show that the architecture can be constructed on FPGA. Third, the fixed-point back projection in FDK algorithm is discussed. Compared to the floating-point algorithm, the computer experiment results show that the relative error rate in fixed-point algorithm is less than 1%.In the applied research of the hardware reconstruction acceleration, this paper advances a speed-up method which uses CUDA in graphics processor (GPU) field. In this method, graphics card based on the new Hardware and software architecture is used. Through the new programming model in the architecture, the weighed, filtering and back-projection step is carried out by the Stream Processor Unit (SPU) in GPU, to achieve the FDK algorithm speed-up. The result shows that the image of 5123 volume in 512 rotation angles can be completed with 32bit floating-point in less than one minute, and the transmission time between the GPU and the computer memory is less than one second. This method gets a faster performance and good quality comparing with the method using CPU.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2009年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络