

Research on Buffer Management and High Speed Interconnection Technology of Network-on-Chip for Multi-processor System-on-Chip

【作者】 尹亚明

【导师】 陈书明;

【作者基本信息】 国防科学技术大学 , 电子科学与技术, 2013, 博士

【摘要】 社会生活与军事科技飞速发展,对高性能嵌入式计算领域提出了更高的要求。VLSI技术的迅猛提升使得片上系统的集成度越来越高,微处理器、存储器、IO设备等越来越多的硬件单元都可以集成在单个芯片上。在应用需求的牵引与VLSI技术的推动下,片上多处理器系统(Multiprocessor System-on-Chip,MPSoC)已经成为高性能嵌入式计算领域的主要研究内容。随着MPSoC技术的发展,单个芯片上所集成的单元数量不断增加,同时这些单元的性能也在不断增长,这使得通信结构设计成为限制系统面积、性能与功耗的主要角色。片上互连网络技术的提出为MPSoC提供了更好的互连解决方案,与传统的片上通信方式相比,NoC具有更好的可预测性、更低的功耗和更好的可扩展性。针对片上多处理器互连网络技术的核心理论与设计技术问题进行研究,可为未来高性能嵌入式多核处理器芯片的设计与实现提供良好的理论与技术基础,具有重要的理论意义和应用价值。本文在对片上多处理器互连网络技术进行了相关描述与分类讨论的基础上,深入研究了片上多处理器互连网络中缓冲区分配、管理与使用的相关技术问题,其中包括面向应用的缓冲区分配策略和路由节点的缓冲区动态使用与管理技术。在对主要功能单元设计实现的基础上,构建了RTL级互连网络模拟平台,基于FPGA实现原型系统并对相关设计参数进行了性能分析与设计探索。最后,面向自行研制的异构多核系统YHFT-QDSP,对其片间高速互连扩展技术进行了研究与实现。本文主要创新工作与研究成果如下:1)针对片上多处理器互连网络中存在的严重资源受限问题,提出一种基于排队模型的NoC缓冲区分配方法。对片上网络中路由器的缓冲区分配问题进行特征化分析与形式化描述,建立了基于M/M/1排队系统的路由器解析模型,并对相关参数进行提取,给出了目标参数的求解过程。利用该模型实现了面向应用映射数据负载的缓冲区分配算法,针对不同的应用映射流量特征,该算法可实现缓冲区资源的定制分配。系统缓冲区资源得到高效利用,与传统均匀分配缓冲区策略相比,在保持性能变化不大的前提下,能够节省约50%的缓冲区使用量。2)分析了静态多通道结构的行为特征与不足,在此基础上提出一种面向输出的多通道动态缓冲区路由器结构OOMCR-DBU,该结构采用基于链表的方式实现动态缓冲资源的管理,使用一种阈值控制的资源预留技术来缓解由于网络拥塞导致动态缓冲资源被无效占用而引起的拥塞干扰问题。完成了两种不同参数的路由节点设计与VLSI实现。实验结果表明,该方法能够在不同的网络流量负载下动态调整虚通道组织方式,改善网络性能。缓解片上路由器缓冲资源利用率低、拥塞现象频繁等问题。同时,阈值控制的资源预留策略有效避免了虚通道间的拥塞干扰问题。3)提出一种通用的片上网络性能分析模型,可用于系统性能分析。构建了RTL级软件模拟环境和基于FPGA的硬件仿真平台。基于提出的动态分配虚通道路由器结构构建片上互连网络,以网络延迟和吞吐率为评价函数,分别针对网络规模、报文长度、缓冲区容量、虚通道数目、路由算法等不同设计参数进行了网络性能分析。实验表明,使用所构建的模拟仿真环境和性能分析方法,可以针对不同的设计目标与约束来选取相应参数配置,以获得良好的设计结果。4)面向一款异构多核嵌入式系统YHFT-QDSP,提出一种基于PCI Express技术的片上多核高速互连方法。分析了PCI Express技术特点与国内外应用情况,针对YHFT-QDSP系统的层次化互连结构特点,设计并实现了片内外协议转换与路由模块QPB。采用IP复用与裁剪的快速设计方法将PCI Express高速互连技术应用于YHFT-QDSP系统中,实现了PCI Express主从模式的对等连接,缩短了设计周期并实现了YHFT-QDSP系统的片外扩展高速互连。

【Abstract】 With the rapid development of social life and military technology, morerequirements have been put forward to high performance embedded computing.Integration of System-on-Chip is increasing driven by the advance of VLSI technology.Microprocessors, memory, IO devices and a growing number of hardware units can beintegrated in a single chip. Multi Processor System on Chip has become a majorresearch area of high performance embedded computing, which is driven by applicationrequirements and VLSI technology. With the development of MPSoC, the number ofcomponents on a single chip and their performance continue to increase, the design ofthe communication architecture plays a major role in affecting the area, performance,and energy consumption of the overall system. Network-on-chip approach wasproposed as a better solution to MPSoC interconnection. NoC approach offers betterpredictability, lower power consumption and greater scalability compared to classicalsolutions for on chip communication. It has great theoretical and practical significanceto study on the theories and design problems about on chip interconnection network inMPSoC, which will provide theory and technology foundation for design andimplementation of future high performance embedded multi-core systems.In the dissertation, in-depth study on buffer allocation, management and applyingtechnical issues is presented, including application specific buffer allocation anddynamic using or managing router buffers. These works is on the basis of problemdescription, classification and discussion of relevant issues in network on chips. A NoCsimulation and emulation platform in RTL level is provided based on the design andimplementation of major functional units. Then performance analysis and designexploration of some technical parameters are carried out using this platform. Finally,high speed inter-chip interconnect technology is researched for expansion ofYHFT-QDSP, which is a independent developed and implemented heterogeneousmulti-core system. The main contributions are listed as follows.1) A buffer allocation approach is proposed based on queuing model, which isaiming at the serious resource-constrained problem in NoC. Characterization analysisand formal description of buffer allocation in NoC router design are provided. Weestablish an analytical router model which uses M/M/1queuing system. The relevantparameters are extracted and the calculation of object function is proposed. Anapplication specific buffer allocation algorithm is implemented using the analyticalmodel. Customized buffer resource allocation can be achieved using the algorithm,according to traffic pattern of different application mapping. In contrast with thetraditional uniform buffer allocation strategy, about50%saving in buffer resources canbe achieved without and reduction in performance. The buffer resources are utilized efficiently in the system.2) A dynamically buffer allocation scheme OOMCR-DBU is proposed to solve thelow buffer utilization and eliminate various congestion, which is based on the behaviorcharacteristics analyzing of static virtual channel structure. Dynamic virtual channelarchitecture is presented using this scheme and the VLSI implementation of router withdynamic virtual channel is completed. The router can regulate the channel organizationaccording to different traffic pattern, and it provides throughput increase and latencydecrease with obvious saving of silicon area and power consumption.3) The software simulation environment in RTL level and hardware emulationplatform based on FPGA is presented. We create a network-on-chip system on the basisof the proposed dynamic virtual channel router. Performance analysis of various designparameters, such as network size, packet length, buffer size, the number of virtualchannels, routing strategy, is carried out with the latency and throughput act as theevaluation function. We can select proper parameter configuration to achieve the designobject and meet different constraints, on the basis of simulation or emulation platformand the performance analysis approach.4) A high speed interconnect scheme based on PCI Express is proposed forYHFT-QDSP, which is an embedded heterogeneous multi-core system. The technicalfeatures and application of PCI Express are analyzed. QLink-PCIE-Bridge, the protocoltransformation and routing module, is implemented aiming at the hierarchicalinterconnection architecture of YHFT-QDSP. PCI Express technique is applied toYHFT-QDSP by IP reuse and cutting. The inter-chip high speed expansion ofYHFT-QDSP is achieved and the design cycle is shortened.
