节点文献

基于环境感知的操作系统高产出率优化技术研究

Research on High Productivity Tuning Technology in OS Oriented to Environmental Sensibility

【作者】 杨沙洲

【导师】 杨学军;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2007, 博士

【摘要】 为了解决传统的高性能计算在目前复杂多样的计算任务面前所表现出的不足,研究者提出了高产出率计算的概念,涵盖了计算系统的设计、开发、生产、使用及评估等各个方面。它强调计算系统应同时在性能(Performance,time-to-solution)、可编程性(Programmability,time-for-idea-to-first-solution)、可移植性(Portability,transparency)和鲁棒性(Robustness,reliability)等四个方面达到用户的要求,并力求降低系统的开发、运行及维护成本。针对这一发展趋势,美国国防预先研究计划局(Defense Advanced Research Projects Agency,DARPA)于2002年提出了HPCS(High Productivity Computing System)研究项目,在IBM、Sun、Cray等多家大公司及相关大学研究所、国家实验室中进行了大幅度的投入,并将其作为量子计算时代来临前的一个重要的过渡性计算体系结构。高产出率计算研究既包括产业链、研发流程、生产部署规范等非技术领域的研究,也包括硬件体系结构、操作系统、编译器、应用程序框架等技术性因素。因为操作系统在计算系统中起到的关键作用,操作系统领域内的高产出率研究显得尤为重要。它既是底层硬件系统高产出率计算能力的反映,又深度影响着上层软件高产出率优化的运行环境和优化效果。同时,操作系统本身在整个系统的性能、可编程性、可移植性、鲁棒性,以及开发运营成本等各个方面都发挥着非常重要的作用。线程环境、作业管理、鲁棒性维护三者是操作系统中对产出率有着重要影响的方向。本文围绕产出率优化目标,在这三个方向上进行了深入的研究,提出了多个产出率指标融合并均衡发展的思路,并将其用于指导操作系统各个环节的设计和开发。影响系统产出率的应用运行环境包括计算能力、能耗、故障率及用户特定需求等内容,本文把对这些环境因素的感知能力和处理技术作为在操作系统中实现高产出率计算的关键环节。同时,考虑到现有非产出率优化的计算系统的分布广泛和对现有应用进行高产出率优化评估的必要性和复杂度,本文将已部署到各个生产部门的现有传统结构的计算系统作为产出率优化改造的主要对象,探讨了如何在传统结构操作系统之上扩展其环境感知能力的问题,深入研究了以产出率目标作为优化指导的高产出率优化技术;在这些研究的基础上,进一步提出了基于“指导-反馈”应用模型的层次式操作系统高产出率优化结构框架。本文的主要创新包括:1.影响高产出率效果的评估指标分析和定义以往高产出率计算的性能、可编程性、可移植性、鲁棒性及开销等各个要素都是以指导性的概念形式提出来的。为了进行有针对性的产出率优化研究,本文围绕性能、鲁棒性、能耗和用户易用性给出了一种可计算的产出率量化方法,将包括程序执行时间、峰值速度、故障率、系统最长稳定运行时间、峰值能耗、平均能效、问题求解时间等的各个产出率要素转化为现实系统中的环境参数。在此基础上,进而探讨了各个产出率要素之间的关系,得出了产出率要素经常会存在优化矛盾的判断,并提出统一各个要素的量纲、在运行时同时考虑多个要素的启发式产出率优化思路。2.环境感知的线程环境线程环境是影响整个系统产出率的关键因子之一,而实现线程环境的产出率优化的关键就是使之具备环境感知和处理能力,让操作系统动态地适应多样化的应用需求。论文第三章以传统操作系统中高产出率的线程环境优化为目标,提出了一种均衡考虑影响系统产出率的线程相关的各个环境因素的模型,并以此为基础设计了一种用于环境感知的高产出率线程调度的算法。为验证和评价模型和算法,设计并实现了一种可定制的线程环境,可以根据应用需求的变化动态调整线程调度策略,从而能够与产出率指导的调度策略相协同。通过实验和测试发现,产出率指导的调度策略与可定制线程环境相结合的组合可以用极低的开销获得了相当显著的产出率优化。3.能耗敏感的作业管理类似星上并行计算系统这样的典型应用环境中,能耗受限是首要的环境因素,而具有能耗限制的类似的分布式异构作业系统在未来计算领域中将越来越普遍。本文以能耗-性能均衡模型为指导,以现有的分布式作业管理系统为原型,围绕能耗受限的特定工作环境,设计了一种基于高产出率优化模型的作业分配算法。算法涉及节点和作业中在性能和能耗上的环境参数,在保证所分配的作业能耗总和不会超过节点能耗上限的基础上,满足将整个系统的性能最大化的要求。4.产出率关联的动态故障排除技术通过分析系统失效对系统产出率的影响,确定降低故障对服务的阻断时间和压缩故障排除时间是一种重要的提高产出率的优化方法,并由此利用基于传统操作系统的动态软件更新技术,在操作系统内核模块层次提出了一种动态故障排除机制。该机制将内核功能按语义关联组织成受保护的模块,能够发现故障,并能实施模块的动态更新,从而实现故障的动态排除。理论分析和实验发现,采用动态故障排除技术很好地实现了与系统产出率优化的融合,以尽可能小的性能损失换取系统故障带来的产出率损耗,从而在一定故障率的条件下,间接提高了系统的产出率。5.基于层次式传统操作系统的产出率优化扩展框架通过对上述三个环节的操作系统高产出率优化方案的总结和抽象,提出了一种适用于银河麒麟层次式操作系统结构的产出率优化扩展框架,其特点是基于“指导-反馈”模型建立产出率优化部件与传统OS部件的关系,分别在应用层、内核服务层和基本内核层中实施对产出率优化的指导并进行信息收集。

【Abstract】 To meet the shortage incurred by complicated and multiplex computing tasks facing to the legacy High Performance Computing (HPC), a NEW HPC, High Productivity Computing, is brought in, which introduces the new concepts in designation, development, employment and evaluation. High productivity computing emphasizes that the computing system should be well supplied in Performance (time-to-solution), Programmability (time-for-idea-to-first-solution), Portability (transparency), and Robustness (reliability) to fulfill users’ needs. And the costs in system development, running and maintenance should be considered, too. DARPA (Defense Advanced Research Projects Agency) has issued a project named HPCS (High Productivity Computing System), which involves some big commercial companies, such as IBM, Sun, and Cray. A few related college institutes and national laboratries are also included in this project. Much finance support has been fed by DARPA to ensure HPCS becoming the essential transitional architecture for computing before the era of quantum computing comes.HPC (that refers to High Productivity Computing as the same below) covers many research fields, which include non-technical ones like industrial chain, circuit of R&D, principles for produce and vendor etc., and the technical ones like hardware architectures, operating systems, compilers, frameworks for applications, etc. As important as an Operating System (abbriated as OS) is in the whole computing system, the research of high productivity in Operating System is quite critical. The high productivity of an OS is the profile of the capability of the lower hardware subsystem. Meanwhile, the running environment and the effectiveness of high productivity optimization in the upper software subsystem is heavily dependent on it, too. OS itself plays an important role in the whole system’s performance, programmability, portability, robustness and excution cost.This paper focuses on the optimization of productivities in the fields of Threading Environment, Job Management, and Robustness Maintenance, which are critical for productivities in OS. The idea of mixing and balancing multiple directives of productivities is proposed and applied in guiding the steps in the designation and development of OS. The abilites of sensing and processing the factors of the running environment for applications,which include computing capacity, power consumption, rate for faults and the special demands of retailed users, are considered as one of the critical parts implementing high productivity computing in OS. While the real-life computing systems are full of non-HP-optimized legacy ones that are deploied to many departments, and the evaluation to the legacy applications with HP-optimization is essiencial but difficult to do, they are specially considered with adaptation for optimization aimed to improve the productivity. The technology of how to expand the abilities of sensing environment for legacy OS is researched. High productivity is taken as the goal of OS optimization. A hierarchical framework of OS optimization for high productivity is given based on the research above, which functions according to the "Guide-Respond" model. Primary innovative work in this paper can be summarized as the following:1. The definition and analysis of the evaluation directives for HP effectivenessPerformance, Programmability, Portability, Robustness and the Costs in HPC were all provided as the instructive concepts. In order to help research in productivity optimization with respect, an operational quantization method is given for evaluation of performance, robustness, power consumption and easy-to-use for users. It converts all factors in HPC into the environmental parameters in real-life systems, including program execute time, peak speed, rate of faults, the longest running time in stability, peak energy cost, average effectiveness of energy, time for problem solving and so on. The relationship between each productivity factor is discussed and it is concluded that the factors are often in conflict with each other when being optimized. So a universal dimension is proposed among each factor. And a heuristic method is proposed too that the multiple factors in conflict (maybe) are required to be considered contemparily during the productivity optimization.2. Environment-sensible Threading Environment.Threading environment is one of the influential factors to the productivity of the whole system. One of the key points of the productivity optimization in the threading environment is to enable the ability of sensing and processing the environmental changes outside. Then it can adapt to the varied application requiements dynamically. To achieve HP optimization in threading environment for legacy OS, a model is given in chapter 3, which contributes in balance among the environmental factors related to threading environment’s productivity, and a HP thread scheduling algorithm based on it is provided to utilize the environmental sensibility. A customizable threading environment is designed and implemented. It can shift thread scheduling policy dynamically adjusted to the changes of application requirements. So, the threading environment can cooperate with the scheduling policy guided by productivity, which is described above. Through experiments and evaluation, it is concluded that the combination of productivity-guide scheduling policy and the customizable threading environment can help the improvement of system productivity with little overhead.3. Job Management with power consumption sensitive.The Paralle Computing System on Star is such a typical application that power consumption limitation is the main environmental factor. More and More distributed heterogeneous systems like that will join the future computing kingdom. Conducted by the model balancing power consumption and performance, originated from the visible distributed job management system, a HP-optimized job dispatching algorithm is given objected to a special working environment with restricted power consumption. The algorithm involves the environmental factors like performance and power consumption in the computing nodes and the jobs. It could ensure that the total power consumption of the jobs would not exceed the limit of the nodes, and then achieve maximization of performance in the whole system.4. Dynamic Fault Elimination with productivities associated.Through the analysis of the influence of system failure for system productivities, it’s concluded that reducing the interrupt time for service by faults and the time to eliminate the faults is a very important way to improve productivity. Based on it as well as the technology of dynamic software update in legacy OS, a mechanism of dynamic fault elimination is given in OS kernel module layer. All concerned kernel functions are organized in some protected module according to semantic relationship. The protected modules could discover the faults and start dynamic update to the infected module. Then the faults are removed while the system is running and service is not stopped. The theoriotical analysis and the experiments prove that the dynamic faults removal could work well with system productivity optimization. It could keep the productivity lost by system failure in lower level with the performance cost as less as possible. In a condition of a certain rate of faults, it could increase the system productivity indirectly.5. The expansion framework for HP-optimization based on hierarchical legacy OS.In sum and abstract of the three HP-optimization solutions above, an expansion framework for HP-optimazation on YH-Kylin hierarchical OS is given. The main feature of the framework is the ’guide-respond’ model that builds a relation betweem the components of productivity optimization and the components in legacy OS. The guide and information collection for productivity optimization is done in the Application Layer, the Server Kernel Layer and the Basic Kernel Layer.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络