节点文献

高性能并行计算系统中低功耗资源管理的设计与研究

Design and Research on Key Techniques of Poweraware Resource Management for High-performance Computing Systems

【作者】 董晶

【导师】 卢宇彤;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2009, 硕士

【摘要】 当前,高性能计算被广泛应用于高科技研究和诸多工业领域,以高性能计算为基础的计算科学得到了显著的发展。同时,需求的不断增加和对性能的苛刻要求给新一代的高性能计算系统的研究与设计带来了严峻的挑战。特别是随着处理器性能和系统规模的迅速提升,急剧增长的功耗严重限制了高性能计算系统的设计和使用。为有效管理系统功耗,提高系统的可靠性和可用性,最终降低高性能计算系统的拥有代价,低功耗技术已成为高性能计算领域的关键技术。上世纪90年代以来,大规模并行处理系统(massively parallel processing,MPP)和集群(Cluster)系统已成为高性能计算机主要的体系结构。在并行计算系统中,并行资源管理软件根据调度算法从作业队列中选择合适的作业,并为其分配和释放计算结点。传统的作业调度和资源管理主要关注两点:降低作业的平均等待时间和提高整个系统的利用率。然而,由于没有足够的作业负载,以及作业调度和资源分配策略无法充分利用系统资源,常常会有空闲资源浪费系统能耗的情况。针对以上问题,本课题分析了高性能并行计算系统负载与能耗的特点,并基于并行资源管理技术,设计了两类针对高性能并行计算系统的自适应功耗管理算法。两类功耗管理算法分别采用了限制资源分配和关闭空闲结点的方法。其中基于限制资源分配的算法根据系统利用率或作业平均slowdown的变化,自适应调整可用结点数目,并关闭非可用结点以节省系统能耗;关闭空闲结点的算法则根据结点的关闭间隔时间或请求和服务速率自适应调整结点的空闲时间阀值,并使空闲时间超出阀值的结点进入“睡眠”状态以节省系统能耗。实验使用来源于ParallelWorkloads Archive的负载对两类算法进行了测试,结果表明,两类算法均能在不违反性能限制的前提下,有效降低系统能耗。

【Abstract】 Nowadays high-performance computing has been widely used in high-tech research and numerous industrial fields. The field of High-performance computing has experienced extensive changes and remarkable development. At the same time, the sharp increase in power consumption has posed a serious challenge to the reliability, availability, and usability of high-performance computing systems.Since the 90s of last century, massively parallel processing (MPP) and clusters have become the two major architectures in high-performance computing systems. In parallel computers, resource management software is responsible for job scheduling and resource allocation. Scheduling policies in parallel systems normally focuses on how to reduce the average job waiting time and improve overall system utilization. However, because there is not enough load, as well as job scheduling policies can not make full use of system’resources, there are significant time periods during which a great many computing nodes in system cannot be utilized.To solve the above problems, this paper analyzes the characteristics of workload and power usage in high- performance parallel computing system. Based on this analysis, we propose two types of self-adaptive power management algorithms to reduce the energy consumption. The first algorithm adjusts the number of available nodes according to the system utilization or job average slowdown while power off unavailable nodes. The second algorithm optimizes the energy efficiency of system by control the node’s threshold of idle time period. A node will be switched into“sleep”mode if it’s idle time period longer than threshold. Detailed experimentation using traces from the Parallel Workloads Archive indicates that there algorithms can achieve considerable overall system energy savings without violating the performance constraints.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络