节点文献

网格环境下的科学工作流优化调度策略研究

Research on Scheduling Strategy of Scientific Workflow in Grid Environments

【作者】 阎朝坤

【导师】 胡志刚;

【作者基本信息】 中南大学 , 计算机科学与技术, 2013, 博士

【摘要】 随着越来越多科学计算项目的提出与开展,用户对网格环境下的科学工作流管理系统服务质量的问题日益重视。作为科学工作流管理系统中的核心组件,调度策略的优劣对系统的执行效率、资源利用率以及对用户的QoS保障程度有直接而重要的影响。然而,科学应用的多样化导致工作流调度目标呈现多样性。一方面,用户的各种QoS需求之间往往相互联系且相互制约;另一方面,用户的QoS需求与网格系统性能之间的矛盾难以协调与平衡。如何对这些QoS指标进行权衡以提升系统服务质量是工作流调度领域的研究热点。此外,网格系统的动态性和自治性等问题使得资源的可用性、可靠性和负载压力难以准确判断和预测,已有的工作流调度策略经常难以有效适应于现实网格环境,例如保证用户QoS需求中的各种不同的约束性条件。因此,对网格系统中工作流调度策略的研究具有良好的理论价值和实用意义。本文围绕面向QOS约束的工作流优化调度以及如何增强动态环境下的用户QOS满意度两个方面展开研究。论文的主要研究内容和创新包括:(1)提出了时间约束下基于CRO的工作流费用优化算法传统的基于分层思想的工作流费用优化算法为工作流任务设定固定的时间窗口,在一定程度上限制了算法的搜索范围。本文将化学反应优化算法应用于时间约束下的科学工作流费用优化调度问题中并与启发式算法GreedyCost-TD相结合,提出了工作流费用优化算法CROTD。针对该优化问题,构建了四种化学分子反应操作的实施规则并基于正交试验给出了算法的优化参数设置。为了避免求解过程中产生不满足工作流时序约束的无效解,提出了基于任务依赖度的初始随机分子构造方法。通过对不同规模的Montage和LIGO工作流的实验结果表明,CROTD算法在费用优化方面具有较好的性能。(2)提出了费用约束下基于性能评估的工作流动态调度算法针对资源上网格任务及本地任务负载的动态性导致任务执行时间难以预测而影响做出有效调度决策的问题,提出以M/M/C型随机服务系统建模资源的执行性能,给出了任务在资源节点上的执行时间的估算方法。基于列表调度的思想和所建立的资源性能评估模型,提出了费用约束下基于性能评估的工作流动态调度算法SSWC_PE。通过对不同规模的Montage和LIGO工作流的实验结果表明,与GreedyTime-CD、LOSS算法相比,SSWC_PE算法在执行时间方面具有较好的性能表现。(3)提出了时间约束下的工作流可靠调度模型与算法网格环境中资源失效情况较为普遍,对资源可靠性以及资源上任务负载状况的感知将极大地增加应用在资源上执行的可靠性。在考虑本地任务对资源服务能力影响的基础上,本文提出采用随机服务模型建模资源的动态服务能力和负载压力,给出了任务在资源上的“执行可靠性”的定义及其计算方法。然后,结合“资源可靠度”和“执行可靠性”建立了一个新的资源节点可靠性评估模型。在此基础上,提出了一种时间约束下的工作流可靠性调度算法RSA_TC。算法将用户时间约束划分到每个子任务中,将整个工作流的全局优化问题转化为单个任务的局部优化问题,降低了问题的复杂度。实验结果表明,提出的可靠性模型能够准确反映网格资源的任务执行特征,RSA_TC算法在执行可靠性方面优于HEFT、PRMS算法。(4)提出了时间保障度增强的科学工作流管理系统架构及相应的工作流调度策略针对资源预留、任务迁移和任务副本等资源管理策略仍然依赖于动态不可靠的网格资源而不能有效应对任务执行时间不可预测的问题,提出了一种时间保障度增强的科学工作流系统架构EDGESA,利用云服务来增强工作流管理系统对应用截止时间的保障能力。针对系统架构中工作流调度这一核心模块,提出以任务违约风险来量化网格资源对工作流任务的时间保障度,使用时间序列模型预测云服务的响应时间。通过实验对EDGESA的截止时间保障能力进行了分析,表明EDGESA能够有效保证应用的执行时间需求,为下一代工作流管理系统的实施提供了参考。

【Abstract】 Nowadays, Grid technology is still an important supporting environment for scientific workflow management system. In such a decentralized, dynamic and autonomous environment, providing non-trivial QoS for end users is a major challenge, which has gained more and more attention. As a core component of scientific workflow management system, scheduling strategies have important and direct impact on the performance of system, resource utilization and QoS guarantee. However, some QoS metrics are contracted and restricted with each other. How to optimize the operation efficiency among these aspects is still a hot topic. Because of the dynamicity and autonomy of Grid system, existing scheduling strategies cannot to be applied into real Grid environment and provide an effective and efficient QoS guarantee service. As a result, the studies for workflow scheduling strategy are helpful for accelerating the pace of scientific progress in both theory and practice.Based on the discussion of the current studies and drawbacks of scientific workflow management system and scheduling strategies, this thesis deeply investigates efficient and effective workflow scheduling strategies in Grid environments. The main contribution of this thesis can be summarized as follows:(1) Research on the cost optimization problems for scientific workflow with deadline constraint. Leveling technology is a popular method in solving the problem, which has been researched by many researchers. However, leveling technology need to set fixed time period for workflow tasks and restrict the search scope. In the paper, a novel cost optimization algorithm, called CROTD is proposed, which combine CRO algorithm and a heuristic algorithm called GreedyCost-TD. Aimed at the optimization problem, a construction method of random initial molecule based on task dependency and detail implementation for four different molecules reaction are proposed. Orthogonal experiment is introduced into the parameter selection of algorithm. Experimental result show CROTD algorithm can obtain better performance. (2) Research on the on-line workflow scheduling algorithm based performance evaluation. Because of the autonomy and task completion of Grid resource, it is difficult to predict the execute time of task in dynamic Grid environment. Based on the analysis of task characteristic in Grid resource, M/M/C stochastic service model is used to model the service capacity and workload status of Grid resource. Then, the calculating method approximately for task execution time is presented. Aimed at the minimizing the makespan of workflow with cost constraint, a dynamic workflow scheduling algorithm based on performance evaluation, called SSWC_PE, is proposed. Compared with Greedytime-CD and LOSS, SSWC_PE performs better on makespan.(3) Research on the reliable workflow scheduling algorithm with time constraint. In order to improve the execution reliability of workflow and enhanced user satisfaction, a stochastic service model considering the impact of local tasks is adopted to describe dynamic workloads of Grid resources. A definition called execution reliability of task is presented to evaluate the probability that meeting deadline of task. Then, combined with the traditional definition for resource reliability, a novel resource reliability evaluation model is introduced. Based on the model, a reliability scheduling algorithm for scientific workflow with cost constraint called RSA_TC is presented. The results of extensive simulation experiments show that the proposed algorithm outperforms PRMS and HEFT, with respect to guarantee deadline and adaptability to dynamic Grid environment.(4) Research on the deadline guarantee enhanced scientific workflow management system architecture and corresponding scheduling strategy. Current workflow management system usually adopt following different techniques to alleviate this problem:resource reservation, rescheduling, task migration, task duplication, which cannot solve the problem efficiently. Aimed at the time sensitive scientific workflow, a novel workflow orchestrating system architecture called EDGESA is presented, which enforces the deadline guarantee of e-science applications by leasing reliable Cloud services. Aimed at the scheduling strategy of deadline-sensitive scientific workflow, metric called Default Risk of Task is provided to judge whether Cloud services should be used. Time Series Model is adopted to evaluate the reponse time of Cloud service. The experimental results show that EDGESA can achieve better performance than other strategies on user’s deadline guarantee.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2014年 02期
  • 【分类号】TP393.09;TP301.6
  • 【被引频次】2
  • 【下载频次】580
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络