如何设置Spark执行器的数量? [英] How to set amount of Spark executors?

查看:290
本文介绍了如何设置Spark执行器的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从Java(或Scala)代码中配置具有SparkConfigSparkContext的执行程序?我经常看到2个执行者.似乎spark.default.parallelism无法正常工作,并且功能有所不同.

How could I configure from Java (or Scala) code amount of executors having SparkConfig and SparkContext? I see constantly 2 executors. Looks like spark.default.parallelism does not work and is about something different.

我只需要将执行程序的数量设置为等于集群大小,但是总是只有2个.我知道我的集群大小.如果这很重要,我会在YARN上运行.

I just need to set amount of executors to be equal to cluster size but there are always only 2 of them. I know my cluster size. I run on YARN if this matters.

推荐答案

好的,知道了. 执行程序的数量实际上不是Spark属性本身,而是用于在YARN上放置作业的驱动程序.因此,当我使用SparkSubmit类作为驱动程序时,它具有适当的--num-executors参数,正是我所需要的.

OK, got it. Number of executors is not actually Spark property itself but rather driver used to place job on YARN. So as I'm using SparkSubmit class as driver and it has appropriate --num-executors parameter which is exactly what I need.

更新:

对于某些工作,我不再遵循SparkSubmit方法.我不能主要针对Spark作业只是应用程序组件之一(甚至是可选组件)的应用程序执行此操作.对于这些情况,我将使用spark-defaults.conf附加到群集配置,并在其中使用spark.executor.instances属性.这种方法更为通用,可以让我根据群集(开发人员工作站,登台,生产)适当地平衡资源.

For some jobs I don't follow SparkSubmit approach anymore. I cannot do it primarily for applications where Spark job is only one of application component (and is even optional). For these cases I use spark-defaults.conf attached to cluster configuration and spark.executor.instances property inside it. This approach is much more universal allowing me to balance resources properly depending on cluster (developer workstation, staging, production).

这篇关于如何设置Spark执行器的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆