节点文献

三维片上网络拓扑结构与容错机制研究

Study on Topology and Fault Tolerant Scheme of3-D Network on Chip

【作者】 周磊

【导师】 吴宁;

【作者基本信息】 南京航空航天大学 , 通信与信息系统, 2013, 博士

【摘要】 片上网络是解决大规模片上系统复杂互联问题的有效方案之一,由于具有高带宽、低延时、易扩展等特点受到学术界关注。传统的二维片上网络布局布线局限于平面,阻碍了系统性能和规模的提升,而三维片上网络由于结构上的优势可以突破这些限制。拓扑结构是三维片上网络设计的关键步骤,对提高三维片上网络通信性能有着非常重要的意义。同时,随着三维片上网络设计规模的扩大和工作频率的提升,系统出现故障的概率也在相应提高,研究面向三维片上网络的容错机制是保证三维片上网络通信可靠性的重要手段。论文对三维片上网络拓扑结构和容错机制展开深入的研究,重点解决三维片上网络拓扑结构评估方法、规则型和非规则型三维拓扑结构的性能优化、针对永久性故障和瞬时性故障的容错机制等关键问题。论文首先研究了三维片上网络中延时和功耗指标的评估方法,建立了三维片上网络延时和功耗的评估模型,并结合改进的三维片上网络拓扑仿真平台,在分析拓扑结构特征的基础上对性能参数进行了仿真和分析,比较了目前三维片上网络中常见的拓扑结构在吞吐量、延时和功耗方面的性能。该评估方法和结果为之后拓扑结构的性能优化设计提供了参考依据,同时提供了性能评估基础。在规则型三维片上网络拓扑结构设计方面,提出一种称为3D-Spidergon的规则三维拓扑原型及延时优化的拓扑生成算法,通过建立拓扑结构和平均延时的关系,确定最小化延时条件下的拓扑结构。同时提出了一种针对该拓扑的自适应路由算法,以纵向路由为优先方向,通过自适应寻找源节点和目的节点的等效最短路径,避免网络拥塞,并提高了网络吞吐量。仿真结果表明,同等规模的3D-Spidergon与3D-Mesh相比,在网络近似饱和的情况下,延时时间降低17%,吞吐量提高16.7%。针对三维片上网络垂直方向互连距离较短的特征,提出一种混合型三维NoC拓扑结构设计方法。在垂直方向,以一种基于伪令牌协议的总线结构,通过同步令牌更新方法提高总线利用率;在水平方向,以全连接网络为设计原型,通过异构的水平子层算法降低网络直径。同时提出一种防拥塞的自适应路由算法,在总线负载较高的情况下优先通过水平子层寻找空闲总线,降低了垂直总线拥塞的可能性。实验结果表明,本文提出的混合型三维拓扑在随机流量分布情况下,平均延时比3D-Mesh低34.4%,比长连接三维拓扑低13.1%,而功耗分别比3D-Mesh和长连接三维拓扑下降43%和7%。在热点分别流量情况下,延时比3D-Mesh低36.9%,比长连接三维拓扑低13.3%,功耗则分别下降48%和5%。为避免三维片上网络中永久性故障对系统功能的影响,提出一种应用于3D-Mesh网络的无死锁容错路由算法DPRA。首先针对3D-Mesh拓扑缺乏故障区域模型的问题,提出一种新的故障块定义规则,减少了受故障节点影响的健康节点数目;在此基础上,设计一种故障节点探测和绕道路径生成算法,采用递归式消息传递实现故障块区域的建立和绕道路径列表的生成。在绕道容错路由算法DPRA中,采用部分路由表和路由规则相结合的方法引导报文绕过故障区域,提高了路由算法的执行效率。实验结果表明,DPRA算法在保证数据到达率的前提下,在节点故障率分别为2%,4%,6%,8%和10%的情况时平均延时分别比禁止转弯模型路由算法减小9.7%,10.3%,13.3%,13.1%和13.4%,而功耗分别降低17.8%,19.6%,15.6%,9.6%和10.2%。针对三维片上网络中主要的瞬时故障来源串扰问题,提出一种以防串扰编码为核心,联合低功耗编码和差错控制编码的容错联合编码方案CAJC。通过基于斐波那切数列的防串扰编码,抑制串扰对数据传输正确率的影响;通过低功耗编码降低总线上数据翻转率,减少了数据的传输功耗;同时通过校验码实现了对数据的检错。在此基础上,探讨了三种引入联合编码的容错路由器设计方案及其资源开销,提出一种使用该编解码器的容错路由单元设计方法。实验结果表明,本文提出的联合编码方案在较低的面积开销前提下,在避免瞬时错误的同时获得了系统传输功耗和延时的降低,是避免片上网络中瞬时性错误的一种有效方案。

【Abstract】 As an effective solution to the complex SoC interconnects, Network on Chip (NoC) is receivingattention by academic circles for low delay, high bandwidth and flexibility. The conventional2D NoChas limited floorplanning choices, and consequently, it limits the performance enhancements arisingout of NoC architectures.3D NoC is capable of achieving better performance, functionality, andpackaging density compared to2D NoCs. In the many aspects of3D-NoC design, it is very importantthat selecting appropriate topology to improve the network performance. In addition, with theincreasing scale of3D-NoC design, the probability of system failure is also corresponding increase.Hence the fault tolerance scheme for3D-NoC is an important means to ensure the reliability of3D-NoC communication. This thesis focus on the3D-NoC topology and fault-tolerant scheme, andtries to solve3D-NoC evaluation methods, regular and irregular3D topology design, permanent faultsand transient fault tolerant scheme.The thesis firstly studied the evaluation methods of latency and power consumption in3D-NoC,and established a set of theoretical calculated model for3D-NoC latency and power consumption,then selected the most popular3D topologies for analysis. On the basis of analyzing the influence oftopology features over the performance by the theoretical analysis and software simulation, theperformance of four topologies in throughput, latency and power consumption were tested andcompared. The results provides the reference for the optimization of3D topology design, alsoestablish the basis of the evaluation for the following research.A regular3D topology generation method called3D-Spidergon is proposed. Aiming atestablishing relationships between the topology architecture and the latency, the3D topology latencymodel based on prototype is proposed, and then the optimization topology structure with minimumlatency is determined based on it. In accordance with the structure, we design adaptive routingalgorithm, which sets longitudinal direction priority to search the equivalent minimum path adaptivelybetween the source nodes and the destination nodes, in order to increase network throughput. Thesimulation shows that in case of approximate saturation network, compared with the same scale3Dmesh structure,3D-Spidergon enjoys17%less latency, and16.7%more network throughput.In order to take advantage of short inter-layer interconnects in vertical for3D-NoC, a novelhybrid3D NoC-Bus architecture is proposed. For vertical link, a Fake Token Bus architecture iselaborated, which utilizes the bandwidth efficiently by updating token synchronously. Based on thisbus architecture, a methodology of hybrid3D NoC-Bus design is introduced. The network hybridizes with the bus in vertical link and distributes long links of the full connected network into differentlayers, which achieves a network with a diameter of only3hops and limited radix. In addition, acongestion-aware routing algorithm applied to the hybrid network is proposed. Experimental resultsshow that, under uniform random traffic, our network can achieve a34.4%and13.1%reduction inlatency and a43%and7%reduction in power consumption compared to the3D-Mesh and long-linkstopology. While under hotspot traffic, our network can achieve a36.9%and13.3%reduction inlatency and a48%and5%reduction in power consumption.To tolerant the permanent fault occurred in the3D-NoC, A fault-tolerant and deadlock-freerouting algorithm DPRA which applied in3D-Mesh is proposed. Aiming at the absence of fault modein3D topology, a new definition of fault block is proposed to reduce the region of fault and theaffected healthy node, and then a detour-path construction algorithm is designed to implement theconstruction of fault block and the generation of detour-path list by recursive of message deliver. Thedetour-path routing algorithm combines the detour-path list and routing rules, make a detour to avoidthe fault block by adding detour-path list into the header flit. The experimental results show that thealgorithm achieves9.7%,10.3%,13.3%,13.1%,13.4%reduction in latency and17.8%,19.6%,15.6%,9.6%,10.2%reduction in power consumption respectively compared to the forbidden turnmode routing, in the node failure rate of2%,4%,6%,8%and10%.Aiming at the crosstalk which leads to transient faults, a joint coding scheme CAJC combinedwith crosstalk avoidance code, low power code and error control code is proposed. To guarantee thecrosstalk avoidance, a crosstalk avoidance code based on Fibonacci numeral system is applied, whichreduce the influence of crosstalk over the data transfer correct rate. The low power code reduce thereversal rate of interconnects, which reduces the power consummation. And the error control codeachieved error detection by adding parity bits. Based on the joint code, the schemes of codec applyingto fault-tolerant router are analyzed and "once encode, multiple decode" scheme is chosen as thedesign method of fault tolerant router. The experimental result shows that the proposed joint codescheme can achieve the crosstalk avoidance and decrease of delay and power by the lower areaoverhead.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络