如何使用动态资源分配执行 Spark 程序? [英] How to execute Spark programs with Dynamic Resource Allocation?
问题描述
我正在使用 spark-summit 命令执行带有以下参数的 Spark 作业:
I am using spark-summit command for executing Spark jobs with parameters such as:
spark-submit --master yarn-cluster --driver-cores 2
--driver-memory 2G --num-executors 10
--executor-cores 5 --executor-memory 2G
--class com.spark.sql.jdbc.SparkDFtoOracle2
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
现在我想使用 Spark 的动态资源分配来执行相同的程序.能否请您帮忙介绍一下动态资源分配在执行 Spark 程序时的使用.
Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.
推荐答案
在 Spark 动态分配中 spark.dynamicAllocation.enabled
需要设置为 true
因为它是 false
默认情况下.
In Spark dynamic allocation spark.dynamicAllocation.enabled
needs to be set to true
because it's false
by default.
这需要将 spark.shuffle.service.enabled
设置为 true
,因为 spark 应用程序在 YARN 上运行.检查此 链接以在 YARN 中的每个 NodeManager 上启动 shuffle 服务.
This requires spark.shuffle.service.enabled
to be set to true
, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.
以下配置也是相关的:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.dynamicAllocation.initialExecutors
这些选项可以通过 3 种方式配置到 Spark 应用程序
These options can be configured to Spark application in 3 ways
1.从 Spark 提交 --conf <prop_name>=<prop_value>
spark-submit --master yarn-cluster
--driver-cores 2
--driver-memory 2G
--num-executors 10
--executor-cores 5
--executor-memory 2G
--conf spark.dynamicAllocation.minExecutors=5
--conf spark.dynamicAllocation.maxExecutors=30
--conf spark.dynamicAllocation.initialExecutors=10 # same as --num-executors 10
--class com.spark.sql.jdbc.SparkDFtoOracle2
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2.在 Spark 程序中使用 SparkConf
在 SparkConf
中设置属性,然后用它创建 SparkSession
或 SparkContext
Set the properties in SparkConf
then create SparkSession
or SparkContext
with it
val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....
3.spark-defaults.conf
通常位于 $SPARK_HOME/conf/
3. spark-defaults.conf
usually located in $SPARK_HOME/conf/
在spark-defaults.conf
中放置相同的配置,如果没有从命令行和代码传递配置,则适用于所有spark应用程序.
Place the same configurations in spark-defaults.conf
to apply for all spark applications if no configuration is passed from command-line as well as code.
这篇关于如何使用动态资源分配执行 Spark 程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!