如何使用动态资源分配执行Spark程序? [英] How to execute Spark programs with Dynamic Resource Allocation?

查看:348
本文介绍了如何使用动态资源分配执行Spark程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用spark-summit命令执行带有以下参数的Spark作业:

I am using spark-summit command for executing Spark jobs with parameters such as:

spark-submit --master yarn-cluster --driver-cores 2 \
 --driver-memory 2G --num-executors 10 \
 --executor-cores 5 --executor-memory 2G \
 --class com.spark.sql.jdbc.SparkDFtoOracle2 \
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

现在,我想使用Spark的动态资源分配来执行相同的程序.您能否在执行Spark程序时帮助使用动态资源分配.

Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.

推荐答案

在Spark动态分配中,spark.dynamicAllocation.enabled需要设置为true,因为默认情况下为false.

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.

这需要将spark.shuffle.service.enabled设置为true,因为spark应用程序正在YARN上运行.选中此链接以在中的每个NodeManager上启动随机播放服务纱.

This requires spark.shuffle.service.enabled to be set to true, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.

以下配置也相关:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

这些选项可以通过3种方式配置为Spark应用程序

These options can be configured to Spark application in 3 ways

1.从Spark用--conf <prop_name>=<prop_value>

1. From Spark submit with --conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster \
    --driver-cores 2 \
    --driver-memory 2G \
    --num-executors 10 \
    --executor-cores 5 \
    --executor-memory 2G \
    --conf spark.dynamicAllocation.minExecutors=5 \
    --conf spark.dynamicAllocation.maxExecutors=30 \
    --conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 \
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2.使用SparkConf

2. Inside Spark program with SparkConf

SparkConf中设置属性,然后使用它创建SparkSessionSparkContext

Set the properties in SparkConf then create SparkSession or SparkContext with it

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3. spark-defaults.conf通常位于$SPARK_HOME/conf/

3. spark-defaults.conf usually located in $SPARK_HOME/conf/

如果未从命令行和代码传递任何配置,则将相同的配置放在spark-defaults.conf中,以适用于所有spark应用程序.

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.

Spark-动态分配Confs

这篇关于如何使用动态资源分配执行Spark程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆