节点文献

高性能计算机无缓存光互连网络技术研究

Research on Bufferless Optical Interconnection Networks for High Performance Computer

【作者】 齐星云

【导师】 窦文华;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2009, 博士

【摘要】 作为解决大规模计算问题的重要手段,高性能计算机被越来越广泛地应用到科学与工程的各个领域。随着高性能计算技术的发展,高性能并行计算机的规模不断扩大,对系统性能的要求也不断提高。当前,提高大规模并行计算机的性能主要从两个方面着手:单个计算结点性能的提高以及连接各计算结点的高速互连网络的优化与改进。随着并行计算机规模的不断增大,需要在更多的计算结点之间实现高效的互连,这对其内部的高速互连网络提出了更高的要求。如何设计大规模并行计算机内部的高带宽、高吞吐率、低延时的互连网络,提高结点互连的效率和性能,已成为高性能计算机体系结构研究领域中亟待解决的重点和难点问题。在高速数据传输环境下,以铜导线为传输介质的的电互连网络存在带宽低,功耗高,抗干扰能力差,互连密度小等不足,成为制约并行互连网络性能进一步提高的瓶颈。光互连技术作为一种新的互连方式,具有带宽高、功耗低、延时小、抗干扰等许多电互连不可比拟的优点,成为并行计算机高速互连网络的研究热点之一。但是,在当前技术条件下,由于无法有效地实现光信号的缓存和逻辑处理,在一般的光互连系统中,需要在网络中间结点上将到达的光信号转换为电信号再进行路由判断和缓存,这势必引入额外的传输延时。本文以降低光互连的额外开销,提高互连网络的实际性能为目标,针对当前光互连技术遇到的障碍,研究了高性能计算机内部无缓存的高速光互连网络技术,提出了一种不需要在中间结点进行光电转换的无缓存的光互连网络结构BOIN(Bufferless Optical Interconnection Network),研究了其路由算法及其容错技术,同时对BOIN网络的性能进行了建模分析和优化设计。论文的主要研究成果包括以下几个方面:1、针对当前光互连网络中无法进行有效的光缓存以及直接逻辑判断的不足,提出了一种不需要在中间结点上将光信号转换为电信号并进行缓存排队和路由选择的BOIN光互连结构。在BOIN网络中,光数据报文始终在光链路上传输,其在中间结点上的路由判断与选择由与其同步传输的电控制报文实现,从而避免了对光信号进行光电转换。在文中研究了BOIN网络的链路协议及端口冲突解决技术,提出了无死锁/无活锁的路由算法,证明了路由算法的可达性,指出采用该路由算法,BOIN网络中的任何报文都必定在有限的时间内由源结点传输到目的结点,并给出了这个由网络规模所决定的传输延时上限。2、为了准确地刻画和评价BOIN网络的性能,本文运用数学工具,分析了网络在各个方向链路上的流量特征,并为其建立了数学模型,得到了BOIN网络在规模和负载一定的情况下,其报文传输平均延时和平均吞吐率等性能指标的解析表达式。同时根据理论分析结果,给出了在一定的网络总规模下网络性能达到最优时其拓扑结构应该满足的条件。模拟结果显示,该模型正确反映了BOIN网络的性能特征,为网络的优化设计提供了分析依据。3、BOIN网络是为了实现高性能计算机内部的高速互连而设计的一种光电互连网络结构,因此如何采取有效的方法,切实提高其互连性能,是本文的研究重点之一。文中着重研究了BOIN网络的性能优化技术,包括避免结点饿死的路由算法,以及具有高吞吐率和高链路利用率的BOIN2网络结构。采用BOIN2网络结构,在只需要增加少许硬件资源的条件下,可以获得明显的性能增长。文中研究了BOIN2网络的路由算法,证明了其与标准BOIN网络相类似,同样具有无死锁/无活锁以及有限传输延时上限等性质。模拟结果表明这些性能优化技术能够有效地提升BOIN互连网络的性能,为大规模并行计算机的设计打下良好的基础。4、在大规模并行互连网络中,容错性能的高低是对网络整体性能进行评价的重要指标。在本文中,针对大规模BOIN网络中可能存在的结点失效问题,提出了一种FT-BOIN容错光互连网络结构,分析了在FT-BOIN网络中结点间的可达关系及其性质,给出了两个结点间存在可达路径的充要条件,并根据该条件研究了几种具有不同容错性能和复杂度的容错路由算法。实验结果表明FT-BOIN网络具有良好的容错能力,当网络中发生结点失效时,能够在可达结点之间实现无阻塞的路由。本文面向高性能计算机内部计算结点间的高速互连,对无缓存的BOIN光互连网络进行了全面的研究,在其拓扑结构、链路协议、路由算法以及性能模型等方面都进行了深入的探索,并且根据性能模型对BOIN网络进行了优化设计,同时还对BOIN网络中的容错路由技术进行了研究。上述研究成果对高性能计算机内部互连网络中遇到的实际问题给出了有效的解决方案,对并行计算机系统结构和互连网络的设计具有一定的理论意义和应用价值。

【Abstract】 As an important method for solving large-scale computational problems, more and more high performance computers (HPCs) have been adopted by nearly every fileds of scientific and engineering projects. The scale of parallel HPCs keeps expanding increasingly, and the urgent demand for even much higher performance has never slowed down in accordance with the development of HPC technologies. At present, improvements of the computing nodes as well as the optimization of interconnect networks are two main approaches to boost HPC’s performance. Computing nodes need more efficient communication when a HPC scales larger, which requires even more support of the interonnect networks. How to design a high-bandwidth, low-latency and high-throughput interconnect network and how to improve its efficiency have become the key problems of HPC researchers.The electrical interconnection networks based on copper wires exhibit some disadvantages such as low bandwidth, high power dissipation, poor immunity against EMI and low-density as data transfer rate becomes higher, which are the key bottlenecks to improve its performance. Optical interconnection network is a new method for connecting thousands of computing nodes together within a HPC. It has many advantages such as higher bandwidth, lower power dissipation, higher immunity against EMI and lower packet delay, which are incomparable with its electrical competitor. Therefore optical interconnection network has become a hotspot in the study of high speed interconnection networks for HPCs. However, current technology lacks the ability of buffering and processing optical signals. In traditional practice, O/E and E/O conversions are used for buffering signals and performing routing procedures, which introduce a large amount of overheads. Aiming at lowering the overheads of optical interconnection networks and improving its efficiency, this dissertation focuses on the study of high speed bufferless optical interconnect networks for HPC, and put forward a new optical interconnection network model named Bufferless Optical Interconnection Network (BOIN). A BOIN network needs neither O/E and E/O conversion nor optical buffering on intermediate nodes, which can overcome the efficiency problems in current optical interconnection networks. Besides studying its routing algorithms and fault-tolerance capacities, this dissertation also gives out a performance model to evaluate the performance of BOIN network. The main contributions of this dissertation are listed below.1. To overcome the traditional deficiency of optical signals buffering and direct logic processing, a new optical interconnection network architecture ? BOIN ? was put forward, within which O/E and E/O conversion and optical buffers on intermediate nodes are avoided. In a BOIN network, O/E and E/O conversion are unnecessary, because optical data packets travel along optical links all the time while routing decisions on intermediate nodes are made according to the synchronized electrical control packets. This dissertation studied on the optical link protocols and methods for port collision avoidance, put forward a livelock-free and deadlock-free routing algorithm, and proved its correctness. Each packet in BOIN will arrive at the destination node within a limited period according to this routing algorithm. The upperbound of delay time is determined by the network size.2. To evaluate the performance of a BOIN network, this dissertation analysized the flow characteristics of each optical link in therory and established a mathematical performance model. Performance metrics such as the average transmission delay, throughput and so on, are derived by solving the model. We also studied the conditions BOIN should meet under which the BOIN network can achieve the optimal performance in a certain network size. The simulation results show that the performance model can describe the characteristics of BOIN, which provides a foundation for network optimization.3. The BOIN network is a new kind of high speed opto-electric interconnection network for high performance computers. How to boost its efficiency is a key aspect of this dissertation. Several optimization techniques were put forward in this dissertation such as starvation avoiding routing algorithm and BOIN2 network structure which can improve the throughput and link utilization of the network. BOIN2 network can perform much better while a little more hardware resources are added. We studied the routing algorithms of BOIN2 network and proved that similar to the BOIN network, the BOIN2 network also has the characteristics such as livelock-free, deadlock-free and transmission delay upperbounds. Simulation results proved the effectiveness of such optimization techniques. They lay a good fundation for large-scale parallel computer design.4. In large-scale parallel interconnect networks, the capability of fault tolerance is very critical for the networks performance. This dissertation also established a fault-tolerant BOIN (FT-BOIN) network to overcome the node failure problems in the BOIN network. The reachability and the properties of the FT-BOIN network are studied, and the sufficient and necessary condition of two nodes can reach each other is put forward. Basing on the analysis, we give out several routing algorithms with different fault-tolerance capability and complexity. It shows that the FT-BOIN network has a quite good fault tolerance, and can perform nonblocking routing between a pair of reachable nodes even if many intermediate nodes fail.In conclusion, this dissertation focused on the design of high speed interconnect networks within the high performance computers. It performed thorough study on the bufferless optical interconnect network ? BOIN, explored its topology, link protocols, routing algorithms and performance model. Effective performance optimization techniques based on the model and fault tolerant routing methods were also put forward. These study results not only provide an effective solution for overcoming problems of practical high speed interconnect networks, but also contribute a great deal to parallel computer system architecture and interconnect network design.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络