spark.sql.shuffle.partitions 和 spark.default.parallelism 有什么区别? [英] What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?

查看:44
本文介绍了spark.sql.shuffle.partitions 和 spark.default.parallelism 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

spark.sql.shuffle.partitionsspark.default.parallelism 有什么区别?

我曾尝试在SparkSQL中设置它们,但第二阶段的任务数始终为200.

I have tried to set both of them in SparkSQL, but the task number of the second stage is always 200.

推荐答案

来自答案这里, spark.sql.shuffle.partitions 配置在为连接或聚合混洗数据时使用的分区数.

From the answer here, spark.sql.shuffle.partitions configures the number of partitions that are used when shuffling data for joins or aggregations.

spark.default.parallelismRDD 中由 joinreduceByKeyparallelize 当用户未明确设置时.请注意,spark.default.parallelism 似乎只适用于原始 RDD,在处理数据帧时会被忽略.

spark.default.parallelism is the default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set explicitly by the user. Note that spark.default.parallelism seems to only be working for raw RDD and is ignored when working with dataframes.

如果您正在执行的任务不是连接或聚合,并且您正在使用数据帧,那么设置这些将不会产生任何影响.但是,您可以通过在代码中调用 df.repartition(numOfPartitions)(不要忘记将其分配给新的 val)来自己设置分区数.

If the task you are performing is not a join or aggregation and you are working with dataframes then setting these will not have any effect. You could, however, set the number of partitions yourself by calling df.repartition(numOfPartitions) (don't forget to assign it to a new val) in your code.

要更改代码中的设置,您只需执行以下操作:

To change the settings in your code you can simply do:

sqlContext.setConf("spark.sql.shuffle.partitions", "300")
sqlContext.setConf("spark.default.parallelism", "300")

或者,您可以在使用 spark-submit 将作业提交到集群时进行更改:

Alternatively, you can make the change when submitting the job to a cluster with spark-submit:

./bin/spark-submit --conf spark.sql.shuffle.partitions=300 --conf spark.default.parallelism=300

这篇关于spark.sql.shuffle.partitions 和 spark.default.parallelism 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆