节点文献

并行路由器体系结构及其关键技术研究

Research on Architecture and Key Techniques of Parallel Router Design

【作者】 戴艺

【导师】 苏金树;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2008, 博士

【摘要】 Internet网络流量、规模和应用的快速发展对互联网核心路由器设计提出了重大挑战。随着光纤传输带宽和入网主机数目的日益增长,路由器交换容量及端口密度难以适应网络流量的增长需求;随着网络规模的急剧扩张尤其是多宿主技术的广泛应用,路由器转发能力难以适应FIB(Forwarding Information Base)表容量的指数级增长;随着IPv6、QoS、组播、安全等应用的发展,路由器报文处理能力难以解决网络流量增长和报文处理复杂度增长之间的矛盾。在路由器中引入并行处理技术的并行路由器体系结构为提高路由器转发交换能力提供了有效途径。采用并行设计给路由器带来了负载均衡和报文乱序两大问题。负载均衡是实现延迟和吞吐率保证的关键,但负载均衡可能导致报文乱序,进而可能触发TCP连接不必要的重传和超时,降低TCP吞吐率,增加报文延迟。如何解决这对矛盾是并行路由器设计的难点。已有的维序调度算法,要么需要复杂的通信或集中式的调度,要么依赖简单的分布式调度而缺乏吞吐率保证。针对以上问题,本文通过深入分析并行路由器体系结构队列模型和信元分布特性,提出了兼顾负载均衡和报文保序,降低调度复杂性和通信开销,并能提供延迟和吞吐率保证的分布式负载分配方法和协同调度机制。现有的路由器包含转发和交换两个连续的处理阶段,它们在硬件实现上是分离的,报文转发与交换的串行执行不利于路由器并行性的开发。本文原创性地提出一种同时开发转发与交换平行度的报文处理机制,将FIB查找功能分解并映射到交换网络中分布执行,为解决目前骨干路由器设计所面临的FIB处理极限问题提供理论、技术和实践支持。论文针对目前并行路由器设计所面临的负载均衡、报文乱序、延迟和吞吐率保证、通信复杂性及FIB极限等问题进行了深入研究,主要研究成果及创新包括以下几个方面:1.针对并行路由器设计负载均衡与报文保序之间的矛盾,提出一种基于流映射的细粒度负载分配算法UFFS-k(Uniform Fne-grain Frame Spreading,k为聚合粒度,简称UFFS-k),UFFS-k算法分布于各输入端口独立执行,根据本地VOQ队列信息分派信元,不需要任何通信开销,以O(1)时间复杂度实现了100%的吞吐率并能保证报文的顺序。模拟结果显示,UFFS-k算法能够有效降低信元延迟并具有较好的负载均衡特性。2.针对异构型并行路由器设计受限于通信及调度的复杂性,难以保证报文的顺序且硬件实现复杂的问题。基于CIOQ交换平面,提出一种具有按序排队特性的IOQ(In-order Queueing)PPS体系结构,在输入端轮询(round robin)分派算法和中间级CIOQ交换平面同步调度算法之间设计了一种简单的协同调度机制,保证同一条流的信元按序从交换平面读出,避免了输出端报文重定序开销。IOQPPS在多个独立的交换平面间执行报文流级的负载均衡,消除了中间级交换平面的加速需求。模拟结果表明,IOQ PPS在同类PPS设计中具有最优延迟性能。3.针对并行路由器设计所面临的FIB处理极限问题,提出一种在交换网络中执行转发操作的新型报文:处理机制FIS(Forwarding in Switching)。FIS将IP查找功能进行分解并映射到多个硬件同构的具有独立转发交换功能的FSN(Forwardingand Switching Node)结点分布执行,通过分解FIB表,构建FIB表到FSN结点的映射,降低了IP查找复杂度;FIB表的分布式存储,解决了并行查找机制访存瓶颈问题,提高了路由器FIB扩容能力。论文对FIS实现关键技术——转发表的分解、子树到FSN结点的映射及面向FIS的IP查找机制进行了深入的研究,给出了相应的解决方案,为FIS原型验证系统的设计与实现奠定了基础。4.面向FIS处理特性,提出了基于前缀范围的IPv6二分查找算法PSB-BS(Prefix Scope Based Binary Search):基于前缀范围的子树表示法消除了对路径信息的保存,降低了存储开销;基于前缀范围的二分查找策略能有效压缩查找路径,减少访存次数。通过构造PSB-BS算法二分查找树实现IPv6转发表到FSN结点的映射,仿真实验结果表明,随前缀数目的增加,每一级FSN结点的平均查找次数几乎没什么变化,反映了PSB-BS算法良好的可扩展性。综上所述,本文针对并行路由器设计的几个关键问题提出了有效的解决方案,对提高路由器的性能、规模、可扩展性,以及推进路由器并行技术的实用化具有一定的理论意义和应用价值。

【Abstract】 Continuing growth in traffic, the size of Internet as well as the Internet applications puts forward great challenge to backbone router design. First, the capacity and port density of the switch can hardly keep up with the growth of traffic resulting from increasing link speeds and the number of hosts on the Internet. Second, IP-lookup performance in core routers can hardly keep up with the growth of FIB size resulting from widely used multi-homed technology and increased Internet size. Third, with the development of Internet applications such as IPv6, QoS, multicast, security, etc. packet process power can hardly fix the contradiction between increasing packet rate and increasing complexity of packet processing. Recently, parallel router architectures introducing parallelism inside routers appears to be an efficient way to scale Internet routers to very high capacities and forwarding rates.Parallel design brings load balancing and packet reordering problems inside routers. Load balancing is prerequisite to achieving delay and throughput guarantees in parallel router architectures. Unfortunately, load balancing may incur out-of-order packets which trigger unnecessary retransmissions and TCP timeouts, thus decreasing TCP throughout and increasing packet delay. How to deal with this contradiction is a crucial problem in parallel router design. The existing scheduling mechanisms maintaining packet ordering either require complex, centralized schedulers, or rely on simple distributed scheduling algorithms that lack throughput guarantees. Aiming at the problems above, by thorough analyzing queuing model and distribution properties of cells in parallel router architectures we propose a load-balancing technique and a cooperative scheduling mechanism which are all distributed and can enforce packet ordering and load-balancing, reduce complexity of scheduling and communication overhead as well as provide delay and throughput guarantees.A router logically consists of two consecutive stages namely forwarding stage and switching stage. These two processing stages are implemented in different hardware components and performed in order, which hinders parallelism development inside routers. We originally propose a distinct packet processing mechanism concurrently exploiting parallelism of switching and forwarding operations. This packet processing mechanism partitions FIB lookup function and distributes FIB lookups to switch fabrics so as to distributedly perform packet forwarding in the process of packet switching, which provide theoretical, technical and practical value for solving FIB limits confronting modern routers.We make comprehensive research on crucial problems of load balancing, packet reordering, delay and throughput guarantees, communication complexity and FIB limits for parallel router design. The major contributions of this dissertation are as follows. 1. To make a better tradeoff between load balancing and packet ordering we propose a fine-grained frame dispatching algorithm called UFFS-k (Uniform Fine-grain Frame Spreading (UFFS-k, where k is the aggregate factor) based on flow mapping which is distributed and can operate independently in each input. UFFS-k dispatches cells based on local VOQs’ state information, moreover, without any communication overhead it guarantees packet ordering and achieves 100% throughput with 0(1) time complexity. As the simulation results demonstrate, UFFS-k reduces packet delay considerably and has better capacity of load balancing.2. Due to communication and scheduling complexity, heterogeneous parallel router architectures have difficulties in maintaining packet ordering and implementing with low-cost hardware components. We propose an IOQ (In-order Queuing) PPS architecture based on CIOQ switch planes. By using a simple collaborative scheduling mechanism between round-robin demultiplexing at inputs and synchronous switching at central switch planes, our scheme guarantees a way for cells of a flow to be read in order from different switch planes, thus avoiding packet reordering at output ports. IOQ PPS achieves packet-level load balancing over multiple independent switch planes and eliminates the speedup requirement for the internal switch planes. As the experiment results demonstrate, IOQ PPS offers improved delay performance compared to existing PPS designs.3. To address FIB limits confronting parallel router design, we propose a parallel packet process mechanism performing packet forwarding in switch fabrics—FIS (forwarding in switching) which partitions IP lookup function and distributes IP lookups to multiple lower speed and heterogeneous nodes called FSN (Forwarding and Switching Node) which performs forwarding and switching independently. By partitioning FIB table and by constructing mapping relationship between FIB table and FSNs, IP-lookup complexity is reduced considerably. Further, the distributed storing of FIB table among FSNs eliminates memory bottleneck of parallel lookup mechanism, thus improving the FIB scalability of modern routers. We make comprehensive research on the key technologies of FIS mechanism including the partition of routing table, the logical mapping from subtries to FSNs, as well as the FIS-oriented IP-lookup mechanism. Our work underlies the design and implementation of FIS hardware prototype system.4. We propose an IPv6 binary lookup algorithm based on prefix scope called PSB-BS (prefix scope based binary search) for putting FIS in practice. The efficiency of PSB-BS algorithm can be attributed to two key aspects. First, the subtrie format based on prefix scope eliminates the presentation of path information which contributes to a reduction in the memory space occupied by the search structure. Second, the binary search scheme based on prefix scope compresses the search path, thus reducing the number of memory access. We map IPv6 forwarding table to FSNs in FIS by constructing binary search trie of PSB-BS algorithm. The experiment results demonstrate the average search time of FSN varies stably with the increased number of prefixes, which reflect better scalability of PSB-BS algorithm.In summary, our work presents solutions to several key problems of parallel router design, and has academic and practical value for improving the performance, size and scalability of modern routers and advancing the practicability of parallel technologies for routers.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络