具有显式setMaster("local")的Spark作业,已通过YARN传递给spark-submit [英] Spark job with explicit setMaster("local"), passed to spark-submit with YARN

查看:514
本文介绍了具有显式setMaster("local")的Spark作业,已通过YARN传递给spark-submit的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个用setMaster("local")编译的Spark作业(2.2.0),如果我用spark-submit --master yarn --deploy-mode cluster发送该作业会发生什么?

If I have a Spark job (2.2.0) compiled with setMaster("local") what will happen if I send that job with spark-submit --master yarn --deploy-mode cluster ?

我尝试了一下,看起来工作确实打包并在YARN群集上执行,而不是在本地执行.

I tried this and it looked like the job did get packaged up and executed on the YARN cluster rather than locally.

我不清楚的地方:

  • 这为什么起作用?根据文档,在SparkConf中设置的内容优先于从命令行或通过spark-submit传递的内容(请参阅:

  • why does this work? According to the docs, things that you set in SparkConf explicitly have precedence over things passed in from the command line or via spark-submit (see: https://spark.apache.org/docs/latest/configuration.html). Is this different because I'm using SparkSession.getBuilder?

setMaster("local")保留在代码中与删除它相比,有没有那么明显的影响?我想知道我所看到的是否是在群集内以本地模式运行的作业,而不是正确使用群集资源.

is there any less obvious impact of leaving setMaster("local") in code vs. removing it? I'm wondering if what I'm seeing is something like the job running in local mode, within the cluster, rather than properly using cluster resources.

推荐答案

这是因为在SparkConf.setMaster之前将您的应用程序提交到Yarn.

It's because submitting your application to Yarn happens before SparkConf.setMaster.

当您使用--master yarn --deploy-mode cluster时,Spark将在本地计算机上运行其主要方法,并上传jar以在Yarn上运行. Yarn将分配一个容器作为应用程序主控器,以运行您的代码的Spark驱动程序. SparkConf.setMaster("local")在Yarn容器中运行,然后创建以本地模式运行的SparkContext,并且不使用Yarn集群资源.

When you use --master yarn --deploy-mode cluster, Spark will run its main method in your local machine and upload the jar to run on Yarn. Yarn will allocate a container as the application master to run the Spark driver, a.k.a, your codes. SparkConf.setMaster("local") runs inside a Yarn container, and then it creates SparkContext running in the local mode, and doesn't use the Yarn cluster resources.

我建议您不要在代码中设置母版.只需使用命令行--masterMASTER env指定Spark master.

I recommend that not setting master in your codes. Just use the command line --master or the MASTER env to specify the Spark master.

这篇关于具有显式setMaster("local")的Spark作业,已通过YARN传递给spark-submit的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆