节点文献

系统级动态热管理关键技术研究

System-Level Dynamic Thermal Management Key Techniques Research

【作者】 舒龙昊

【导师】 李曦;

【作者基本信息】 中国科学技术大学 , 计算机系统结构, 2011, 硕士

【摘要】 处理器制造技术的飞速发展使得更多的计算资源被集中在一块很小的芯片上。单位面积计算资源的增加使得处理器的功耗密度和局部温度急剧上升从而形成温度热点(HotSpot)。片上温度不均给处理器及其冷却系统的设计带来很多挑战,同时有可能导致处理器运行时的逻辑错误甚至永久性的物理损伤。另一方面,冷却系统是按处理器最高工作负载的热状态来设计,而大多数情况下处理器并不处于高负载的状态,这就意味着冷却系统的设计成本被变相的增大。低功耗,低能耗和热耗管理技术可对系统的温度进行在线控制,这就从一定程度上减少了冷却系统的设计成本。因此设计有效的处理器温度动态控制技术变得越来越重要。本文在前人工作的基础上,充分调研了体系结构支持的多种可被利用的系统温度控制手段,例如动态电压频率缩放(DVFS),动态功耗管理(DPM,和操作系统层的各种资源管理手段,例如任务调度,内存分配等的优势与劣势。最终采用灵活性更强,控制面更大的系统级动态热耗控制技术(主要是任务调度)进行处理器的温度控制。本论文开展的主要研究工作包括:1.分析目前计算机发展特别是处理器发展过程中出现的严峻挑战,说明控制处理器温度的重要性和紧迫性。同时介绍国内外关于处理器温度控制的研究成果和现状。2.分析工作负载对处理器温度的影响,刻画工作负载的冷热特征并找到在线温度控制的机会。提出动静态参数相结合的工作负载刻画方法,并在此基础上进行任务的冷热特征刻画。3.分析应用程序体系结构级的运行特征,提出高速缓存缺失分布(Cache Miss Distribution)与平均每指令时钟周期数(CPI)相结合的工作负载刻画方法,并在此基础上进行任务的冷热特征刻画。4.一方面提出基于启发式匹配规则的温度感知的任务调度方法。另一方面将温度感知的任务调度问题形式化为在线学习模型。在线学习是机器学习理论的一种,该方法将过去系统的所有状态作为当前决策的一种依据,并通过判断不同决策的损失程度来决定最终的选择。5.设计时间片缩放(Time-Slice Scaling)和间隔调度(Alternative Scheduling)机制来进一步降低程序运行时处理器的峰值温度并缩短峰值温度的持续时间。6.在真实的Linux操作系统中对于上述在线控制技术的设计和实现。本论文研究的新贡献如下:1.将机器学习的理论应用到处理器温度的在线控制,并设计了一套完整的在线学习方法,使得处理器温度控制技术有了一定的理论保证。2.结合高速缓存缺失分布和平均每指令时钟周期数来刻画工作负载,并在此基础上刻画任务的冷热度。3.提出了时间片缩放和间隔调度的方法来进一步的降低运行时处理器的峰会温度并缩短峰值温度的持续时间。

【Abstract】 With the rapid development of computer manufacturing technology, more computing resources are combined into one small chip area, which makes the on-chip power density and local temperature rise sharply then leading to temperature hotspots. Uneven on-chip temperature brings many challenges for processor and its cooling system design. Meanwhile, uneven chip temperature and temperature hotspots may lead to logic errors when the processor is running and even permanent physical damage. On the other hand, the design principles of cooling system have considered the situation of highest workloads. While in most cases, the processor is not high loaded, which means that the cost of the cooling system increases in disguise. The low-power, -energy and thermal management techonologies can do online control for system temperature, which reduces the cose of cooling system design radically. It is increasingly important to design effective dynamic techniques for processors’temperature control.In this thesis, I have investigated the advantages and disadvantages between several architectural-supported thermal control methods, such as DVFS and DPM techniques, and OS-levevl resource management approaches, such as task scheduling and memory allocation. Finally, I adopt the system-level approach to control processor temperature, which is more flexiable and powerful.The main research works in this thesis include:1. Summarize the current status of computer system developments and analyze the emerging challenges faced by the processor designers in the future. Then manifest the importance and emergency for processor’s temperature control. Introduce the research status and achievements for temperature control at home and broad.2. Analyze the effects to processor’s temperature brought by workloads. Then characterize workloads’hot-cool feature and find opportunities for online temperature control. Propose the workload characterization approach through combining different dynamic and static parameters and characterize tasks’hot-cool features.3. Analyze the runtime feature on architectural level and propose the workload characterization approach through combining cache miss distribution and CPI and characterize tasks’hot-cool features.4. Propose a temperature-aware task scheduling approach based on a heuristic corresponding principle. Then Formulating the problem of temperature-aware task scheduling into online learning model. Online learning is one of the machine learning methods. This method takes all of the past system states into consideration to make decision. Each decision is based on a process of loss factor evaluation.5. Design Time-Slice Scaling and Alternative Scaling schemes to reduce runtime chip temperature further or shorten the time length of peak temperature.6. Design and implementation on real Linux platform.The contributions and innovations of our works include:1. Applied the machine learning theory to online thermal control and design the online learning framework, which make it theoretically garanteed.2. Combining the CPI and Cache Miss Distribution to characterize workload and achieving hot-cool characterization for tasks.3. Proposing novel Time-Slice Scaling and Alternative Scheduling schemes to reduce chip temperature further or shorten the time length of peak temperature.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络