节点文献

片上网络演算模型及性能分析

Calculus Models and Performance Analysis for Networks-on-chip

【作者】 钱悦

【导师】 窦文华;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 博士

【摘要】 随着半导体技术的迅猛发展,集成芯片中处理器核的数量日益增长,全局互连将导致严重的片上同步错误、不可预知的通信延迟和巨大的功耗开销。为了缓和这些矛盾,片上网络(Network-on-Chip,NoC)应运而生,替代传统的总线或点到点互连成为片上新的通信架构。NoC除具有更好的可预测性和更低的功耗开销,而且能提供更好的可扩展性以应对成百个处理器核的互连。性能分析是NoC一个重要的研究方向,对于构建性能可预测的系统、提供端到端的QoS保证和加速设计空间搜索意义重大。本文基于网络演算对片上网络进行系统建模和性能分析,重点推导业务流的端到端延迟上界,研究提供尽力服务的分组交换NoC中,网络冲突、拓扑结构、流量控制、交换策略及缓冲区大小对通信性能的影响。主要研究内容包括以下四个方面:(1)NoC中多业务流竞争网络资源的冲突模型在分组交换NoC中,业务流的冲突情形复杂多样,冲突情形可被划分为三种基本模式:嵌套、平行和交叉。分析了这三种基本冲突模式并分别得到了它们的分析模型。对于多条冲突流流经节点串的复杂情形,可以分解成基本模型并使用冲突树进行描述。冲突树模型不仅能描述目标流在其传输路径上与其他冲突流的冲突,而且能刻画多条冲突流所经历的间接冲突。通过遍历冲突树计算所有冲突流流经冲突树树枝后的输出曲线,进而推导出树干提供给目标流的等价服务曲线,并计算目标流的延迟上界。整个分析过程可归纳为两个主要算法:冲突树生成算法和等价服务曲线推导算法。研究了并行处理中重要的集合操作–多对一聚合通信,并推导了聚合通信的延迟上界计算公式。模拟实验验证了模拟和分析结果的一致性。比较了Bakhouya模型、Fidler模型和冲突树模型的延迟结果。(2)NoC二维和三维拓扑通信性能的对比分析芯片集成技术的迅猛发展,使得片上网络从二维向三维扩展成为一个重要的发展方向。三维NoC因拓扑维度的增加而缩短了通信距离,极大的提升了网络的平均通信性能。对比分析了规则的k-ary-2-mesh网络及其对应的三维网络在最差情形下的通信性能,得出一个重要结论:将二维NoC转换成三维NoC,虽然能提高网络的平均性能,但最差情形下的性能不一定能得到提升。最差情形下的通信延迟对垂直链路带宽、网络规模和流量突发这些网络参数的取值更为敏感。在垂直链路带宽较窄、网络规模较小和流量突发较大的情况下,三维网络最差情形下的通信性能明显劣于二维网络。这表明在优化NoC设计,将网络拓扑从二维扩展到三维时,需要全面考虑和衡量通信性能,不仅平均延迟要得到保证,最差情形下的延迟也不容忽视。(3)NoC基于信约的链路级流量控制机制的性能分析基于信约的路由器到路由器流量控制机制是NoC主要采用的链路级流量控制机制。基于网络演算分析了基于信约的流量控制机制的性能和最优缓冲区大小。为了对信约的反馈控制行为进行建模,提出了一个抽象的网络服务元素–流量控制器,以定理形式推导了流量控制器和整个系统的服务曲线。另外,给出并证明了保证系统最大服务曲线的最优缓冲区分配定理。假设路由器提供延迟-速率服务,给出了微包延迟上界和最优缓冲区大小的计算公式。片上多媒体流的模拟实验验证了模拟和分析结果的一致性及最优缓冲区大小的准确性。(4)NoC虚通道虫孔交换策略的性能分析基于网络演算分析了无死锁的虚通道虫孔交换NoC中业务流最差情形下的微包延迟上界。在虫孔交换NoC中,报文的传输可能因为信约短缺、交换机或虚通道分配失败而阻塞。构建了业务流在这些阻塞条件下的资源共享分析模型。分别利用流量控制和链路共享分析模型消去初始模型中的反馈控制和链路共享,保留初始模型的缓冲区共享和网络结构,得到一个简化的分析模型–缓冲区共享分析网络。从而将问题转化到冲突树模型中,推导路由器节点串提供给业务流的端到端等价服务曲线。给出了缓冲区共享分析网络的构建算法,并总结了虫孔交换NoC的延迟上界分析方法。假设仿射到达曲线和延迟-速率服务曲线,推导了延迟上界的计算公式。综上所述,本文紧紧围绕“分析业务流端到端的延迟上界”这一目标,基于网络演算提出了NoC的冲突树演算模型、拓扑性能对比分析模型、流量控制演算模型和虫孔交换演算模型,为NoC建立了一套完备的确定性性能分析方法,为网络演算这一新兴的数学理论开辟了一个新的应用领域。

【Abstract】 With the rapid development of semiconductor technology, the number of cores ona single chip continues to increase, the global interconnects would cause severe on-chipsynchronization errors, unpredictable delays, and high power consumption. To mitigatethese effects, the Network-on-Chip (NoC) approach emerged recently as a promising al-ternative to classical bus-based and point-to-point (P2P) communication architectures.Aside from better predictability and lower power consumption, the NoC approach offersgreater scalability compared to previous solutions for on-chip communication. Perfor-mance analysis has been a major concern for NoC and plays an increasingly importantroleinbuildingpredictablecommunicationsystem, providingend-to-endQoSguaranteesand speeding up the design space exploring. This thesis employs network calculus to per-form systematic modeling and performance analysis on NoCs with focusing on per-flowend-to-end delay bound. The effects of network contentions, topologies, flow control,switching strategies and buffer size are analyzed on the communication performance inpacket-switching NoCs with best-effort services. The main contributions are:(1) Contention model for multiple flows sharing resources in NoCsIn packet-switching NoCs, network flow contention scenarios are diverse and com-plicated. The contention scenarios the tagged flow may experience can be classified intothree patterns, Nested, Parallel and Crossed. Based on these patterns, any complex con-tentionscenariocanbedecomposedintothebasicpatternsandexpressedwithacontentiontree. This contention tree model captures not only the tagged flow’s contention with otherinterfering flows along its routing path but also the indirect contention experienced bythe interfering flows. We scan the tree to compute the output arrival curves of the flowstraversing each branch in the contention tree. Then we derive the equivalent service curvefor the tagged flow traversing the trunk of tree and calculate its delay bound. We proposeananalysisprocedurewhichcontainstwo mainalgorithms: 1)constructacontentiontree;2) scan the tree and compute the equivalent service curve for the tagged flow. We derivea closed-form formula to calculate the delay bound for all-to-one gather communication,which is an important collective operation for parallel processing. Experimental resultsvalidate the consistency of simulated maximum delays and our analytical bounds. Wealso perform comparative analysis on Bakhouya’s model, Fidler’s model and ours. (2) Comparative analysis of communication performance for 2D and 3D NoCsAdvanced integration technologies enable the construction of NoC from two dimen-sions to three dimensions. 3D NoCs can improve average communication performancebecause of the possibility of using the additional dimension to shorten communicationdistance. We study the worst-case communication performance in regular k-ary-2-meshnetworks and their 3D counterparts. Through both analysis and simulation, we show that,while the average performance in 3D NoCs is better than that of their 2D counterparts,the worst-case communication performance in 3D NoCs may be worse than that of their2D counterparts. The worst-case delay is more sensitive to the vertical link bandwidth,the network size and traffic burstiness. The 3D worst-case delay bound becomes worsethan the 2D worst-case delay bound for lower vertical link bandwidth, smaller topologysize and larger traffic burstiness. This suggests that a thorough investigation on not onlyaverage but also worst-case performance is necessary when dimensioning the networktopology from 2D to 3D.(3) Analyzing credit-based link-level flow control for on-chip networksCredit-based router-to-router flow control is one main link-level flow control mech-anism proposed for NoCs. Based on network calculus, we analyze its performance andoptimal buffer size. To model the feedback control behavior due to credits, we introducean abstract network service element called flow controller. Then we derive its servicecurve, and further the system service curve with theorematic forms. In addition, we giveand prove a theorem that determines the optimal buffer size guaranteeing the maximumsystem service curve. Moreover, assuming the latency-rate server model for routers, wegive closed-form formulas to calculate the flit delay bound and optimal buffer size. Ourexperiments with real on-chip traffic traces validate that our analysis is consistent and theoptimal buffer size is exact.(4) Analyzing wormhole switching with virtual channels in NoCsBased on network calculus, we derive per-flow worst-case flit delay bounds fordeadlock-free virtual-channel (VC) wormhole networks. In such networks, packet trans-mission may be stalled due to lack of credits, failure of switch allocation and VC alloca-tion. Webuildanalyticalmodelsforflowsunderthesestallingconditions. Bysequentiallyapplyingtheanalyticalmodelsforflowcontrolandlinksharing,wecanobtainasimplifiedanalysis model, which eliminates the feedback control and link contention. As keeping only the buffer sharing and network structure of initial model, this model is called buffer-sharing analysis network. The problem is transformed into the contention tree model, toderive per-flow end-to-end equivalent service curve which the tandem of routers provide.We propose an algorithm to construct the buffer-sharing analysis network and summarizea general analysis procedure to derive the delay bound. The derivation of delay boundand closed-form formulas are presented by supposing that the arrival curve conforms tothe affine function and the service curve follows the latency-rate function.In summary, we aim to drive per-flow end-to-end delay bound. Based on networkcalculus, we have proposed four calculus models of contention tree, comparative analysisof 2D and 3D topologies, flow control and wormhole switching, which establish a set ofcomplete deterministic approaches for NoCs’performance analysis, and develop a newapplication field for this emerging mathematical theory of network calculus.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络