如何同时运行2个EMR Spark步骤? [英] How to run 2 EMR Spark Step Concurrently?

查看:159
本文介绍了如何同时运行2个EMR Spark步骤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在EMR中同时运行2个步骤.但是,我总是使第一步运行,第二步未完成.

I am trying to have 2 steps run concurrent in EMR. However I always get the first step running and the second pending.

我的Yarn配置的一部分如下:

Part of my Yarn configuration is as follows:

{
    "Classification": "capacity-scheduler",
    "Properties": {
    "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
    "yarn.scheduler.capacity.maximum-am-resource-percent": "0.5"
    }
  }

当我在本地Mac上运行时,我可以在Yarn上以类似配置运行第二个应用程序,其中的更改实际上是火花提交资源请求,以匹配所需的群集容量和性能.

When I run on my local Mac I am able to run the 2 application on Yarn with similar configuration, where the change are actually spark submit resource request, to match the cluster capacity and performance required.

换句话说,我的纱线被设置为运行多个应用程序.

In other words, My yarn is set up to run multiple application.

因此,在我深入研究它之前,我想知道实际上是否可以同时执行该步骤或仅连续执行该步骤?

Hence, before i dig hard into it, i wonder if it is actually possible to have the step run concurrently or only serially ?

还有其他提示或特定要同时运行的内容吗?

Else is there any tips or something specific to run to job concurrently ?

关于每个作业请求,我的集群容量过大.因此,我不明白为什么它不能同时运行.

My cluster is over capacitated with respect to what each job request. Hence i don't not understand why it can't run concurrently.

推荐答案

  • 是否可以使该步骤同时运行或仅连续运行?

    • Is it possible to have the step run concurrently or only serially?

      • 由AWS支持人员确认,我们不能并行(并行)运行多个步骤,这些步骤是串行的,因此您所看到的(即处于待处理状态的第二份工作)是可以预期的.

      是否有任何提示或特定内容可同时运行?

      Is there any tips or something specific to run to job concurrently?

      • 您可以将两个spark-submit都放入bash脚本中并运行bash脚本,但是您可能会在AWS Web控制台(imo速度很慢)上失去一些直接调试信息,您可以在上看到这些调试信息spark-history server

      On your local mac, you are able to run multiple YARN application in parallel because you are submitting the applications to yarn directly, whereas in EMR the yarn/spark applications are submitted through AWS's internal `command-runner.jar`, it does a bunch of other logging/bootstrapping etc to be able to see the `emr step` info on the web console.

      这篇关于如何同时运行2个EMR Spark步骤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆