spark.sql.shuffle.partitions 究竟指的是什么? [英] What does spark.sql.shuffle.partitions exactly refer to?

查看:58
本文介绍了spark.sql.shuffle.partitions 究竟指的是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

spark.sql.shuffle.partitions 到底指的是什么?我们是在谈论作为宽转换结果的分区数量,还是在中间发生的某些事情,例如在宽转换的结果分区之前的某种中间分区?

What exactly does spark.sql.shuffle.partitions refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?

因为据我所知,根据我们的广泛转变

Because in my understanding, as per a wide transformation we have

Parents RDDs -> shuffle files -> Child RDDs

这里的spark.sql.shuffle.partitions参数指的是什么?shuffles 文件CHILD RDDs 或其他我忽略的东西?

What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?

推荐答案

这已经在 官方文档:

spark.sql.shuffle.partitions 200 配置混洗数据以进行连接或聚合时使用的分区数.

spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations.

换句话说就是子Dataset的分区数.

In other words it is the number of partitions of the child Dataset.

这篇关于spark.sql.shuffle.partitions 究竟指的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆