如何设置Spark执行器的数量? [英] How to set amount of Spark executors?
问题描述
如何从Java(或Scala)代码中配置具有SparkConfig
和SparkContext
的执行程序?我经常看到2个执行者.似乎spark.default.parallelism
无法正常工作,并且功能有所不同.
How could I configure from Java (or Scala) code amount of executors having SparkConfig
and SparkContext
? I see constantly 2 executors. Looks like spark.default.parallelism
does not work and is about something different.
我只需要将执行程序的数量设置为等于集群大小,但是总是只有2个.我知道我的集群大小.如果这很重要,我会在YARN上运行.
I just need to set amount of executors to be equal to cluster size but there are always only 2 of them. I know my cluster size. I run on YARN if this matters.
推荐答案
好的,知道了.
执行程序的数量实际上不是Spark属性本身,而是用于在YARN上放置作业的驱动程序.因此,当我使用SparkSubmit类作为驱动程序时,它具有适当的--num-executors
参数,正是我所需要的.
OK, got it.
Number of executors is not actually Spark property itself but rather driver used to place job on YARN. So as I'm using SparkSubmit class as driver and it has appropriate --num-executors
parameter which is exactly what I need.
更新:
对于某些工作,我不再遵循SparkSubmit
方法.我不能主要针对Spark作业只是应用程序组件之一(甚至是可选组件)的应用程序执行此操作.对于这些情况,我将使用spark-defaults.conf
附加到群集配置,并在其中使用spark.executor.instances
属性.这种方法更为通用,可以让我根据群集(开发人员工作站,登台,生产)适当地平衡资源.
For some jobs I don't follow SparkSubmit
approach anymore. I cannot do it primarily for applications where Spark job is only one of application component (and is even optional). For these cases I use spark-defaults.conf
attached to cluster configuration and spark.executor.instances
property inside it. This approach is much more universal allowing me to balance resources properly depending on cluster (developer workstation, staging, production).
这篇关于如何设置Spark执行器的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!