Hadoop中的作业和任务调度 [英] Job and Task Scheduling In Hadoop

查看:157
本文介绍了Hadoop中的作业和任务调度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我阅读有关延迟公平调度的信息时,我对Hadoop中的作业调度"和任务调度"一词几乎不感到困惑.

I am little confused about the terms "Job scheduling" and "Task scheduling" in Hadoop when I was reading about delayed fair scheduling in this slide.

如果我的以下假设有误,请纠正我:

Please correct me if I am wrong in my following assumptions:

  1. 默认调度程序,容量调度程序和公平调度程序仅在用户调度了多个作业时才在作业级别有效.如果系统中只有一项工作,那么他们就不会扮演任何角色.这些调度算法构成作业调度"的基础

  1. Default scheduler, Capacity scheduler and Fair schedulers are only valid at job level when multiple jobs are scheduled by the user. They don't play any role if there is only single job in the system. These scheduling algorithms form basis for "job scheduling"

每个作业可以具有多个映射并减少任务,如何将它们分配给每台计算机?如何为一项工作安排任务? 任务计划"的基础是什么?

Each job can have multiple map and reduce tasks and how are they assigned to each machine? How are tasks scheduled for a single job? What is the basis for "task scheduling"?

推荐答案

对于合理的调度程序,当有单个作业正在运行时,该作业将使用整个群集.当提交其他作业时,将腾出的任务插槽分配给新作业,以便每个作业获得大致相同的CPU时间.

In case of fair scheduler, when there is a single job running, that job uses the entire cluster. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time.

与形成作业队列的默认Hadoop 调度程序不同,这可以让短时间的作业在合理的时间内完成,而不会使长时间的作业饿死.这也是在多个用户之间共享集群的简便方法.公平共享也可以与工作优先级一起使用-优先级用作权重,以确定每个任务获得的总计算时间的比例.

Unlike the default Hadoop scheduler, which forms a queue of jobs, this lets short jobs finish in reasonable time while not starving long jobs. It is also an easy way to share a cluster between multiple of users. Fair sharing can also work with job priorities - the priorities are used as weights to determine the fraction of total compute time that each job gets.

CapacityScheduler 旨在允许共享大型集群,同时为每个组织提供最小的容量保证.中心思想是,Hadoop Map-Reduce集群中的可用资源在多个组织之间进行划分,这些组织根据计算需求共同为集群提供资金.组织还有一个额外的好处,即组织可以访问其他人未使用的任何多余容量.这样可以以经济高效的方式为组织提供弹性.

The CapacityScheduler is designed to allow sharing a large cluster while giving each organization a minimum capacity guarantee. The central idea is that the available resources in the Hadoop Map-Reduce cluster are partitioned among multiple organizations who collectively fund the cluster based on computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner.

这篇关于Hadoop中的作业和任务调度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆