节点文献

GPU矩阵乘法和FFT算法的性能优化

Performance optimization of matrix multiplication and FFT in GPU

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 LI Xiao-wen1,CUI Xiang2 (1.Department of Command and Control,Air Defense Forces Academy,Zhengzhou 450000,China; 2.College of Computer & Information Engineering,Henan University,Kaifeng 475003,China)

【机构】防空兵学院指挥控制系；河南大学计算机与信息工程学院；

【摘要】当前GPU的体系结构为高性能计算提供了良好的可编程性。为了得到众核GPU高性能程序设计的一般方法,探索GPU程序性能优化技术,对在GPU上进行高性能程序设计的经验进行了总结。通过基准测试,得到GPU性能指标,对GPU程序设计进行指导。使用CUDA对单精度矩阵乘法和FFT进行性能优化,前一个算法是计算密集型任务,后一个算法是带宽密集型任务。在NVIDIA GeForce GTX280 GPU上,矩阵乘法算法达到393 Gflop/s的峰值速度,比CUBLAS 2.0数学库提高了5%;对于一些维度的FFT计算也取得了较好的性能。更多还原

【Abstract】 The optimization technique of GPU program performance is investigated for obtaining the common method to design many-core GPU high-performance program.The authors′ experiences in improving the performance of two key algorithms: single-precision matrix-matrix multiplication subprogram(SGEMM of BLAS) and single precision FFT using CUDA are discussed in this paper.The former is computation intensive,while the latter is memory bandwidth or communication-intensive.The peak speed of 393 Gflops was achieved on NVIDIA GeForce GTX280 GPU for the former.It is about 5% faster than the CUBLAS 2.0 library.Better FFT performance was obtained for a range of dimensions.Some common principles are discussed for the design and implementation of many-core algorithms.更多还原

【关键词】 GPU程序设计；矩阵乘法；快速傅里叶变换；性能优化技术；
【Key words】 GPU programming； matrix multiplication； FFT； performance optimization technique；

【基金】国家“863”高技术研究发展计划项目基金(2012AA010902);国家自然科学基金资助项目(61240045;10571178)

【文献出处】现代电子技术 ,Modern Electronics Technique , 编辑部邮箱 ,2013年04期

【分类号】TP391.41
【被引频次】5
【下载频次】236

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献