如何使用动态资源分配执行 Spark 程序? [英] How to execute Spark programs with Dynamic Resource Allocation?

查看:22
本文介绍了如何使用动态资源分配执行 Spark 程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 spark-summit 命令执行带有以下参数的 Spark 作业:

I am using spark-summit command for executing Spark jobs with parameters such as:

spark-submit --master yarn-cluster --driver-cores 2 
 --driver-memory 2G --num-executors 10 
 --executor-cores 5 --executor-memory 2G 
 --class com.spark.sql.jdbc.SparkDFtoOracle2 
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

现在我想使用 Spark 的动态资源分配来执行相同的程序.能否请您帮忙介绍一下动态资源分配在执行 Spark 程序时的使用.

Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.

推荐答案

在 Spark 动态分配中 spark.dynamicAllocation.enabled 需要设置为 true 因为它是 false 默认情况下.

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.

这需要将 spark.shuffle.service.enabled 设置为 true,因为 spark 应用程序在 YARN 上运行.检查此 链接以在 YARN 中的每个 NodeManager 上启动 shuffle 服务.

This requires spark.shuffle.service.enabled to be set to true, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.

以下配置也是相关的:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

这些选项可以通过 3 种方式配置到 Spark 应用程序

These options can be configured to Spark application in 3 ways

1.从 Spark 提交 --conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster 
    --driver-cores 2 
    --driver-memory 2G 
    --num-executors 10 
    --executor-cores 5 
    --executor-memory 2G 
    --conf spark.dynamicAllocation.minExecutors=5 
    --conf spark.dynamicAllocation.maxExecutors=30 
    --conf spark.dynamicAllocation.initialExecutors=10  # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2.在 Spark 程序中使用 SparkConf

SparkConf 中设置属性,然后用它创建 SparkSessionSparkContext

Set the properties in SparkConf then create SparkSession or SparkContext with it

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3.spark-defaults.conf 通常位于 $SPARK_HOME/conf/

3. spark-defaults.conf usually located in $SPARK_HOME/conf/

spark-defaults.conf中放置相同的配置,如果没有从命令行和代码传递配置,则适用于所有spark应用程序.

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.

Spark - 动态分配会议

这篇关于如何使用动态资源分配执行 Spark 程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆