节点文献
科学工作流管理及调度研究
Study on Scientific Workflow Management and Scheduling
【作者】 刘灿灿;
【导师】 骆志刚;
【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2011, 博士
【摘要】 目前的e-Science研究越来越依赖于大规模科学应用程序和软件工具对海量数据的分析处理能力及网格环境中高性能资源的计算能力,作为一种帮助科学家进行复杂流程组合和流程自动运行的管理平台,科学工作流管理系统(Scientific Workflow Management System, SWfMS)在科研过程中发挥着越来越重要的推动作用,而科学工作流(Scientific Workflow, SWF)的相关技术也逐渐成为当今学术界的研究热点。目前多个大型e-Science中心分别开发了面向特定领域的SWfMS,但这些系统缺乏统一标准且系统间的互操作也比较困难,在新的领域中进行复杂流程管理时需要对已有系统进行大量修改或重新开发,为了充分利用科研资源并达到多系统间互操作的目标,研究一个标准统一的通用型SWfMS成为目前SWF管理中亟需解决的问题之一。同时,随着越来越多网格资源的加入,目前对计算资源的付费使用也成为一种必然趋势,在这类效用网格中对工作流进行调度时需考虑工作流的执行时间、执行费用及可靠性等多个目标,这些目标间相互联系且相互制约,如何在多个目标间进行权衡并达到多目标综合性能的最优值也是近几年的研究热点之一。在深入分析SWF管理及调度的研究现状与不足的基础上,本文围绕通用型SWfMS的相关内容及工作流在效用网格下的有效调度展开了大量研究,主要贡献如下:(1)分析并总结SWF的相关技术和研究现状鉴于SWF刚刚起步的状态,对目前国内外已有的相关工作进行全面总结和比较。分别从SWF模型、表示、语言及调度等多方面对其关键技术进行全面总结、比较与评价,并对近两年的最新研究和国内的研究现状进行分析,为全文工作的开展奠定基础。(2)设计并实现基于BPEL的通用型SWfMS针对目前多个SWfMS间互操作较困难的缺点,研究通用性较好的BPEL (Busicess Process Execution Language)模型,以集合预报为应用背景,设计并实现基于BPEL的通用型科学工作流管理系统——集合预报科学工作流管理系统(Ensemble Prediction Scientific Workflow system, EPSWFlow)。系统利用BPEL中丰富的控制语义、对Web服务的全面支持等优点实现了对预报流程中各服务的按需组合与调度;并采用JSDL (Job Submission Description Language)对实验环境中无法进行Web服务封装的大量遗留应用程序进行描述,通过基于Web的标准作业调度软件GridSAM对这些作业进行调度与监控,解决了遗留应用程序的集成问题。(3)研究通用型SWfMS的动态适应性针对SWF的动态适用性需求,分别从SWF模型,SWfMS的系统实现及执行期的系统容错等方面对SWfMS的动态适应性进行研究。提出SWF四层抽象模型,在不同抽象层上实现对Web服务和底层遗留应用程序的抽象描述,并在执行过程中由SWF引擎动态选择服务,对资源进行实时绑定,以支持SWF的动态适应性;此外,研究SWF执行期的系统容错,在EPSWFlow中实现三种容错策略,有效提高系统在执行过程中的异常处理能力,进一步提高系统的动态适应性。(4)研究效用网格下截止期约束的工作流费用优化调度问题工作流在各种环境和不同条件下的任务调度是工作流研究领域的重要内容之一,其调度性能的好坏直接影响系统的运行效率。在对资源进行付费使用的效用网格中,针对截止期约束的工作流费用优化问题提出三种有效的调度算法:基于时序一致的截止期约束逆向分层算法TCDBL (Temporal Consistency based Deadline Bottom Level)、基于路径平衡的工作流费用优化算法PBCO (Path Balance based Cost Optimization)及基于优先级规则BFTCSTM (Best Fit based on Time-dependent Coupling Strength and Temporal Mobility)的迭代算法,三种算法从不同角度对工作流的费用优化问题进行研究,均取得了很好的调度效果。(5)研究动态资源下基于优先级因子的工作流时间-费用优化调度问题在资源状态动态变化的网格环境中,工作流执行完成之前很难对工作流的执行时间或执行费用进行准确预测,因此研究基于优先级因子的费用优化策略对工作流的执行时间与执行费用同时进行优化。在分层策略的基础上提出三种实时调度算法:基于逆向分层的Sufferage算法(BLSuff)、基于逆向分层的Min-Min算法(BLMin)及基于逆向分层的Min-Max算法(BLMax)。三种算法均基于逆向深度对任务进行分层,设计基于优先级因子的衡量标准对任务逐层进行调度,达到了同时优化工作流执行时间与执行费用的目标,取得了良好的调度效果。综上所述,本文针对目前SWF技术中亟需解决的几个关键问题进行了研究,并提出有效的解决方案。本文的研究对于推动复杂科学计算流程的组合和管理,并最终推动科研进程的发展具有较高的理论价值和应用价值。
【Abstract】 Recently, the development of the e-Science research is, to some extent, determined by the data analysis of large scale scientific application programs and software tools on a huge number of data, as well as the computation abilities of the high performance resources for utility grids. As an effective management platform for combining complex processes and operating processes automatically, Scientific Workflow Management System (SWfMS) plays a more and more important role in relevant researches, and the development of new technologies of Scientific Workflow (SWF) has gained more and more attention. Recently, many large e-Science centers have developed dozens of SWfMSs in their own specific research domains, however, there is no any general standard among these systems and the co-operation of several systems is difficult. Therefore, it is necessary to generate a new system for general use by modifying an existing system for a new domain, or even developing a new system. Moreover, the“pay-per-use”model is becoming popular as more and more resources are added to the grids, several aspects should be taken into consideration when scheduling the workflow on these utility grids, such as workflow execution time, workflow execution cost, system reliability and so on. These objectives are contracted and restricted with each other. Therefore, how to optimize the operation efficiency among these aspects has become a hot topic. Based on the discussion of the current studies and drawbacks of SWF management and scheduling, this thesis is focused on the studies of SWfMS and the workflow scheduling on the utility grids. The main contribution of this thesis can be summarized as follows:(1) A review of the relative technologies and current studies of SWFWe review the current studies of SWF, including the lifecycles, models, presentations, languages, scheduling, and so on. We compare these technologies and analyze the recent studies, which provide the basis for the studies in this thesis.(2) Design of the SWfMS for general use based on BPELIn order to solve the problems in co-operation among several SWfMSs, we exploit the general using of Business Process Execution Language (BPEL), and design a new SWfMS referred as Ensemble Prediction Scientific Workflow system (EPSWFlow) based on BPEL with application in ensemble prediction. Based on the merits of BPEL such as plenty of control structure, full support for the web services, etc, EPSWFlow accomplishes combining and scheduling the services exisiting in the ensemble process dynamically. Moreover, EPSWFlow exploits JSDL (Job Submission Description Language) to describe a large number of legacy applications which cannot be enveloped to web sevices, schedules and monitors these applications by using the standard job submission system GridSAM, which solves the problems of intergrating legacy applications.(3) Research on adaptability of SWfMS on general purposeTo address the dynamic adaptability of SWF, we performed studies on SWF models of the architecture and implementation of EPSWFlow system, and propose a four-level abstracted model to provide an abstracted description of the Web services and legacy applications at each abstracted level. The SWF engine can select a service dynamically during the executing stage and make a real-time binding for the resource. At the same time, we study the system reliability, and provide three types of fault-tolerant strategies, which improve the abilities of solving abnormal situation, and improve the system reliability further.(4) Research on the cost optimization problems in workflow scheduling with deadline constraint on utility gridsThe workflow scheduling problem in different environments and conditions is one of the most important topics in SWF management because the scheduling result can make a great effect on the system performance. In order to solve the time and cost trade-off problem in workflow scheduling with deadline constraint, we present three novel algorithms in this thesis: Temporal Consistency based Deadline Bottom Level algorithm(TCDBL), Path Balance based Cost Optimization algorithm (PBCO) and BFTCSTM(Best Fit based on Time-dependent Coupling Strength and Temporal Mobility) rule-based iterative algorhtm. All these algorithms can decrease the workflow costs comparing with the previous algorithms.(5) Research on the time-cost trade-off problem with priority factors in workflow scheduling under dynamic environmentAs it is difficult to make an exact prediction of workflow execution time and workflow execution cost in the dynamic grid environments ahead of schedule, we study the time and cost trade-off problems based on the priority factor in workflow scheduling. We propose three real-time heuristics based on the bottom level strategy: Bottom Level based Sufferage (BLSuff), Bottom Level based Min-Min (BLMin) and Bottom Level based Min-Max (BLMax). These heuristics divide the tasks into several groups based on the workflow synchronization properties, and design a metric to optimize the workflow execution time and cost simultaneously using the trade-off factor, which obtain a better scheduling result.To sum up, we have studied on several key problems in scientific workflow management and scheduling , and propose some effective solutions. The studies in this thesis are helpful for the further study on the composition and management for the complex scientific computations and therefore accelerate the pace of scientific progress in both theory and practice.
【Key words】 e-Science; Scientific Workflow (SWF); Ensemble Prediction; Dynamical Adaptability; Workflow Scheduling; Time-cost trade-off; Trade-off Factor;