并行运行EMR的步骤 [英] Running steps of EMR in parallel

查看:121
本文介绍了并行运行EMR的步骤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 EMR集群上进行工作,我所面临的问题是全部

I am running a spark-job on EMR cluster,The issue i am facing is all the

触发的EMR作业正在逐步执行(队列中)

EMR jobs triggered are executing in steps (in queue)

有什么办法可以让它们并行运行 如果没有的话

Is there any way to make them run parallel if not is there any alteration for that

推荐答案

默认情况下,Elastic MapReduce的YARN设置非常面向步骤",带有单个CapacityScheduler队列,分配了100%的集群资源.由于采用了这种配置,因此每次将作业提交到EMR群集时,YARN都会最大限度地利用该作业的群集使用率,并为该作业分配所有可用资源,直到完成为止.

Elastic MapReduce comes by default with a YARN setup very "step" oriented, with a single CapacityScheduler queue with the 100% of the cluster resources assigned. Because of this configuration, any time you submit a job to an EMR cluster, YARN maximizes the cluster usage for that single job, granting all available resources to it until it finishes.

在EMR集群(或任何其他基于YARN的Hadoop集群)中运行多个并发作业,需要使用带有多个队列的正确YARN设置,以正确地为每个作业分配资源. YARN的文档很好地介绍了Capacity Scheduler的所有功能,听起来很简单.

Running multiple concurrent jobs in an EMR cluster (or any other YARN based Hadoop cluster, in fact) requires a proper YARN setup with multiple queues to properly grant resources to each job. YARN's documentation is quite good about all of the Capacity Scheduler features and it is simpler as it sounds.

YARN的FairScheduler颇受欢迎,但是它使用了不同的方法,根据您的需要,可能会更难配置.在最简单的情况下,您只有一个公平队列,因此YARN会尝试通过运行作业将容器释放后,立即将其分配给等待的作业,以确保提交给集群的所有作业至少能获得一部分计算资源.只要可用.

YARN's FairScheduler is quite popular but it uses a different approach and may be a bit more difficult to configure depending on your needs. Given the simplest scenario where you have a single Fair queue, YARN will try to grant containers to waiting jobs as soon as they are freed by running jobs, ensuring that all the jobs submitted to a cluster get at least a fraction of compute resources as soon as they are available.

这篇关于并行运行EMR的步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆