节点文献

高性能、时钟精确C67X DSP指令模拟技术研究

Research on High-Performance Cycle Accurate Simulation Technology for C67X Digital Signal Processor Instruction Set

【作者】 潘烽锋

【导师】 蔡铭;

【作者基本信息】 浙江大学 , 计算机应用与技术, 2011, 硕士

【摘要】 DSP是一类解决计算密集型问题的高性能处理器,被广泛地应用到嵌入式系统的诸多领域,例如音视频编解码、图像分析与处理等。随着技术的发展,DSP硬件结构、指令集和流水线的复杂度不断提升,导致研制高精确高性能DSP模拟平台的难度不断增大。本文集中探讨了针对TMS320C67X系列VLIW架构流水线的模拟策略,以及实现中的性能优化技术。本文首先从VLIW架构特性出发,针对延迟槽模拟、流水线停顿模拟等流水线顺序模型的缺陷进行分析,然后选用流水线倒序模型,并以此为基础对流水线进行精确模拟。流水线倒序模型按照逆序对流水线各阶段进行串行化模拟,即先模拟指令执行阶段,后模拟指令获取阶段的方式,以此解决顺序模型的缺陷,提高流水线模拟精度。进而,分析了流水线模型中存在的性能瓶颈,提出采用指令译码缓存和指令执行信息环形队列技术进行性能优化。在此基础上,以优化后的流水线倒序模型为核心,设计实现了一个C67X指令模拟实验平台TIC67Xsim,具有:指令模拟、内存模拟、寄存器模拟、目标文件加载等功能。本文实验选用Whetstone Benchmark、Dhrystone Benchmark和切比雪夫低通数字滤波器算法作为实验模拟平台测试用例。实验结果表明,本文论述的实验模拟平台能够正确模拟C67X指令集,并具有较高性能,可作为验证应用程序、扩展自定义功能的实验模拟平台。

【Abstract】 Currently, DSP as a solution to the high-performance processor-intensive computing is increasingly being applied to many fields of embedded systems. With technological development, DSP hardware architecture, instruction set and pipeline becomes more and more complex, which brings simulation lots of difficulties. This paper focuses on the simulation of TMS320C67X’s pipeline and the improvement of simulation performance.The paper firstly introduces the characteristics of VLIW architecture, and then analyzes the defects of pipeline sequence simulation model in delay slots, pipeline stalls and so on. So we use pipeline reverse order simulation model for instruction set simulation. The model firstly simulates the function of instruction execution stages, and the function simulation of instruction fetch stages will be happened at last. We will also presentation the reason and the benefits of the model. Furthermore, the paper analyses the performance bottlenecks that may exist in implementation of the model. Based on the analysis, we design two mechanisms to optimize the performance of pipeline model. According to the hardware feature, we design and implement the instruction simulation module, memory module, register module and files load module, together with the optimization module composed of entire platform.At last we use Whetstone benchmark, Dhrystone benchmark and Chebyshev low-pass digital filter algorithm as test suite. The result shows that not only the platform can simulate the instruction set correctly, but also has high performance. It is an experimental platform which particularly suitable to verify application and to expand custom features.

【关键词】 TMS320C67XVLIW高性能时钟精确指令集模拟
【Key words】 TMS320C67XVLIWHigh performanceClock accuracyISS
  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 07期
  • 【分类号】TP368.1
  • 【下载频次】65
节点文献中: 

本文链接的文献网络图示:

本文的引文网络