如何使用动态资源分配执行Spark程序? [英] How to execute Spark programs with Dynamic Resource Allocation?
问题描述
我正在使用spark-summit命令执行带有以下参数的Spark作业:
I am using spark-summit command for executing Spark jobs with parameters such as:
spark-submit --master yarn-cluster --driver-cores 2 \
--driver-memory 2G --num-executors 10 \
--executor-cores 5 --executor-memory 2G \
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
现在,我想使用Spark的动态资源分配来执行相同的程序.您能否在执行Spark程序时帮助使用动态资源分配.
Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.
推荐答案
在Spark动态分配中,spark.dynamicAllocation.enabled
需要设置为true
,因为默认情况下为false
.
In Spark dynamic allocation spark.dynamicAllocation.enabled
needs to be set to true
because it's false
by default.
这需要将spark.shuffle.service.enabled
设置为true
,因为spark应用程序正在YARN上运行.选中此链接以在中的每个NodeManager上启动随机播放服务纱.
This requires spark.shuffle.service.enabled
to be set to true
, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.
以下配置也相关:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.dynamicAllocation.initialExecutors
这些选项可以通过3种方式配置为Spark应用程序
These options can be configured to Spark application in 3 ways
1.从Spark用--conf <prop_name>=<prop_value>
1. From Spark submit with --conf <prop_name>=<prop_value>
spark-submit --master yarn-cluster \
--driver-cores 2 \
--driver-memory 2G \
--num-executors 10 \
--executor-cores 5 \
--executor-memory 2G \
--conf spark.dynamicAllocation.minExecutors=5 \
--conf spark.dynamicAllocation.maxExecutors=30 \
--conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2.使用SparkConf
2. Inside Spark program with SparkConf
在SparkConf
中设置属性,然后使用它创建SparkSession
或SparkContext
Set the properties in SparkConf
then create SparkSession
or SparkContext
with it
val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....
3. spark-defaults.conf
通常位于$SPARK_HOME/conf/
3. spark-defaults.conf
usually located in $SPARK_HOME/conf/
如果未从命令行和代码传递任何配置,则将相同的配置放在spark-defaults.conf
中,以适用于所有spark应用程序.
Place the same configurations in spark-defaults.conf
to apply for all spark applications if no configuration is passed from command-line as well as code.
这篇关于如何使用动态资源分配执行Spark程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!