节点文献

处理器条件分支指令处理关键技术研究

Research on Kev Techniques of Conditional Branch Processing

【作者】 陈晨

【导师】 严晓浪;

【作者基本信息】 浙江大学 , 电路与系统, 2013, 博士

【摘要】 随着各种应用对处理器性能的需求不断提高,超标量、超深流水线以及投机执行等技术被应用到处理器设计中以提高指令并行度,而条件分支指令由于具备条件执行及程序流控制的双重特性,对并行度造成负面影响,因此高效的条件分支指令处理是保证上述技术发挥潜能的前提。本文重点研究若干面向性能优化的条件分支处理关键技术,主要研究内容和创新点包括:1、基于预测极性动态变换的分支预测方法研究。通过研究分支预测错误的时间局部性,提出一种基于预测极性动态变换的分支预测方法,动态监测未经极性变换的原始分支预测错误率,筛选出预测错误率高于阈值的预测错误高峰期,将高峰期内的预测极性进行变换,使变换后的最终分支预测错误率维持在较低水平,以提高整体分支预测精度。该方法可解决传统基于分支别名的预测方法无法解决的分支抖动等问题。2、基于多层次过滤的分支预测方法研究。通过研究分支预测错误的空间局部性,提出一种基于多层次过滤的分支预测方法,将预测空间分为多个层次,动态监测各层分支预测错误率,进而将各层中集中分布的少数错误倾向性分支过滤到下一层中通进行针对性处理,降低各层预测错误率,从而提高整体预测精度。该方法可解决传统多路预测方法中各通路均需处理全部条件分支从而造成资源利用率不高的问题。3、基于多级缓冲以及基于预测粒度自适应的并行分支预测方法研究。先提出一种基于多级缓冲的并行分支预测方法,在分支空闲周期内访问预测器,提前预取后续分支预测信息并对其进行缓存,当同时出现多条条件分支时,从缓存的信息中选取对应预测信息分配给各条分支,该方法可在小于等于8的取指带宽下实现高精度并行分支预测。随后进一步提出一种基于预测粒度自适应的并行分支预测方法,根据取指带宽和分支行为,自适应地将若干条件分支封装成指令包,以指令包作为预测粒度,并以指令包为单位维护历史信息,该方法可在任意取指带宽下实现高精度并行分支预测。4、基于解码缓冲器复用及PC越级传输的循环加速方法研究。针对循环体特性,提出一种基于解码缓冲器复用及PC越级传输的循环加速方法,通过PC越级传输,使设计多表项解码缓冲器成为可能,进而复用该缓冲器,在循环过程中从缓冲器内向执行单元发送循环体指令,加速循环执行。并通过自循环宽发射技术,解决循环体指令分布、循环衔接、cache位宽限制等影响循环处理性能的问题。本文提出的关键技术对提高条件分支指令处理性能具有积极的理论研究意义与实际应用价值。

【Abstract】 With the increasing demand for complex embedded applications, techniques such as superscalar, deep pipelines and speculate execution are employed in modern microprocessors to explore great degrees of instruction parallelism. On the other hand, conditional branch instructions with the characteristic of conditional execution and flow control bring a deleterious effect on the instruction level parallelism. Consequently, techniques mentioned above rely on accurate conditional branch processing in order to develop their potential. This thesis focuses on key techniques of high performance conditional branch processing. The original contributions of this thesis are as follows:1. Branch prediction based on dynamic polarity transformation. By study and analysis of the temporal locality property of branch misprediction, a new branch prediction strategy is proposed, which based on dynamic polarity transformation. This approach monitors original branch misprediction rate whose polarity has not been transformed, and detects the periods with original branch misprediction rate higher than a threshold. These periods are called as peaks of misprediction. The polarity of original prediction results will be transformed to make the final prediction during peaks of misprediction, which keeps the final branch misprediction rate at a low level. This scheme can solve problems which traditional branch alias branch predictor cannot solve.2. Multi-layered filter (MLF) branch prediction. By study and analysis of the spatial locality property of branch misprediction, a new branch prediction strategy is proposed, which is called multi-layered filter (MLF) branch prediction scheme. The MLF prediction divides the prediction space of branch into multiple layers, and monitors the misprediction rate of each layer. In MLF prediction, only few difficultly predictable branches of each layer are filtered to next layer, and the sub-predictor of next layer can be dedicated to these difficultly predictable branches, improving the prediction accuracy and hardware efficiency. The filtering mechanism can solve the problem of low hardware resource utilization efficiency which traditional multiple-bank based branch predictors suffer.3. Multiple-level buffered parallel branch prediction and adaptive prediction granularity parallel branch prediction. A multiple-level buffered parallel (MLBP) branch prediction is proposed at first. The MLBP prediction accesses the predictor continuously in cycles when there is no conditional branch, and prefetch the prediction results of future branches. The prediction results prefetched are buffered at different levels. When multiple conditional branches are fetched at the same time, the prediction results buffered before will be allocated to these branches synchronously. This scheme can get a good performance in processors with an instruction fetch bandwidth less than or equal to eight. Then we further propose a new branch prediction scheme which based on adaptive prediction granularity. The new scheme adaptively changes the prediction granularity according to the bandwidth of instruction fetch and the behavior characteristics of branches. More specifically, different numbers of branches constitute an instruction package, and the branch histories are maintained in packages. As a result, this prediction scheme can process any number of branches in a single package. This prediction scheme can get high prediction accuracy in processors with any instruction fetch bandwidth.4. Loop accelerating scheme based on reuse of decode buffer and PC transmission across pipelines. By study and analysis of the characteristics of loop body, a new loop accelerating scheme is proposed, which is based on reuse of decode buffer and PC transmission across pipelines. The new scheme reduces the information needed in decode buffer by transferring PC related information across pipelines, which makes the design of decode buffer with many entries possible. Moreover, the new scheme reuse the decode buffer to process loops in program. That is, a loop body area will be created in decode buffer when a loop conditional branch turns up. Then during the loop execution time, the loop body instructions will be provided by the decode buffer, which improves the efficiency of loop execution. The new cheme further adopts the self-circulation wide issue mechanism to make up the performance losses caused by the loop body alignment problem, loop joining problem and cache output bandwidth problem.Techniques proposed in this thesis facilitate the high performance processing of conditional branch, and have positive effects on both theoretical researches and practical applications.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2014年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络