节点文献
容错系统中实时任务调度和负载均衡算法研究
Real-Time Scheduling and Load Balancing Algorithms in Fault Tolerant Systems
【作者】 王健;
【作者基本信息】 浙江大学 , 计算机应用技术, 2009, 博士
【摘要】 容错系统担负关键控制系统角色,已经被广泛应用于国防、航空航天、核反应堆控制、通信行业、过程控制、医药行业等领域,容错计算技术也已经成为计算机科学技术一个重要的学科领域。近年来,随着实时应用和分布式应用兴起,容错系统一个新的发展趋势是不仅要求系统能够屏蔽故障,还要求系统中关键任务必须能够及时正确被调度完成,保证系统在故障发生前和故障发生后达到负载均衡的状态,从而扩展容错系统在实时计算和分布式计算领域中的应用,提高资源利用率及性能。本文深入研究容错系统中实时任务调度和负载均衡算法。目前容错系统中实时任务调度算法大多针对硬件容错,很少考虑软件的运行故障;并且在针对硬件容错时,具有过高的硬件冗余度。针对上述问题,提出软件容错模型中部分抢占实时任务调度算法和主/副版本容错模型中一个高效的实时任务调度算法。此外,由于目前缺少通用的可适用于分布式容错系统的负载均衡算法,因此提出主/副版本容错模型中一个通用的负载均衡算法,并将算法应用于分布式容错环境中一个全球股票集中撮合系统。总结上述,本文的主要贡献如下:1)提出软件容错模型中针对硬实时系统软件运行故障的部分抢占调度算法——RMPPA和EDFPPA算法。部分抢占调度算法不仅可以获得与以前算法近似调度性能,还可以在一定条件下大大减少抢占次数,降低系统运行开销。2)提出主/副版本容错模型中针对硬实时系统硬件故障的一个高效的任务调度算法——TPFTRM算法。TPFTRM不仅最大限度利用副版本重叠和分离技术减少硬件冗余度,还将任务集合和处理器集合划分调度,使TPFTRM调度算法便于理解、实现以及减少调度所需要的运行时间。3)提出主/副版本容错模型中静态负载均衡算法——RSA算法。RSA算法根据任务主/副版本的负载情况将进程集合分配到各个处理机,使处理机在发生故障前后都处于负载均衡的状态。4)将RSA算法应用于一个基于分布式数据划分模型的全球股票集中撮合系统,提高负载均衡能力。
【Abstract】 Fault-tolerant systems currently take over the safety-critical role in many areas such as defense, airplane, nucleus control, communication, industry process control, medical. Fault-tolerant computing also becomes an important subject of the computer science. These years according to the real-time computing and distributed computing rapid developing, a new trend of fault-tolerant systems is to ensure crtical tasks be executed timely and acquire load balancing in both absence and presence of faults, hence to expand the fault-tolerant applications in the real-time and distributed computing areas and to improve fault-tolerant systems resource utilization and performance.Therefore, this dissertation considers the real-time scheduling and load balancing algorithms in fault tolerant systems. Almost all fault-tolerant scheduling algorithms in real-time systems so far are designed to deal with hardware faults, less of them take possible software faults into account hence this dissertation proposes two partial-preemptive algorithms in the deadline mechanism which provides software fault-tolerance in hard real-time systems. Moreover, none of the previous proposed fault-tolerant scheduling algorithms have good scheduling performance for all the cases when the task set has different upper bound for the task load hence this dissertation proposes an efficient scheduling algorithm which extends the uniprocessor RM algorithm to primary-backup model to provide fault tolerance. In addition, this dissertation proposes a universal load-balancing algorithm for primary-backup based fault tolerant systems. At last, this dissertation reports a global equity crossing fault-tolerant system as a case study to demonstrate the load-balancing algorithm benefits in reality.The contributions of this dissertation are summarized as below:1) Two partial-preemptive scheduling algorithms called EDFPPA and RMPPA are proposed in the deadline mechanism which provides software fault-tolerance in hard real-time periodic task systems. Extensive simulations results show that both EDFPPA and RMPPA can obtain the similar scheduling performance as well as the well-known algorithms so far. Moreover, EDFPPA and RMPPA reduce the preemption dramatically than previous algorithms, thus reduce the negative impact introduced by preemption such as overhead runtime computation time.2) An efficient scheduling algorithm called TPFTRM is proposed in primary-backup based fault-tolerant systems. Compared with previous scheduling algorithms in this area, TPFTRM maximizes the backup over-booking and deallocation, thus reduces the hardware redundancy. Moreover, TPFTRM proposes the task partitioning and processors grouping technique, which reduce the scheduling computation time and also make an easy way to understand and implement it.3) A load-balancing algorithm called RSA is proposed in primary-backup based fault-tolerant systems. Compared with previous work of this area, RSA algorithm has the load better balanced no matter how many backup processes each primary process owns.4) A global equity crossing fault-tolerant system is described as a case study which integrates RSA algorithm and the other mechanisms to demonstrate the load-balancing algorithm benefits in reality.
【Key words】 fault-tolerant system; real-time system; distributed computing; deadline mechanism; primary-backup; scheduling algorithm; load-balancing algorithm;