火花中的合并(numPartitions)是否经过改组? [英] Does coalesce(numPartitions) in spark undergo shuffling or not?

查看:129
本文介绍了火花中的合并(numPartitions)是否经过改组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于火花转换功能,我有一个简单的问题.

I have a simple question in spark transformation function.

coalesce(numPartitions)-将RDD中的分区数减少到numPartitions.筛选大型数据集后,对于更有效地运行操作很有用.

coalesce(numPartitions) - Decrease the number of partitions in the RDD to numPartitions. Useful for running operations more efficiently after filtering down a large dataset.

val dataRDD = sc.textFile("/user/cloudera/inputfiles/records.txt")
val filterRDD = dataRDD.filter(record => record.split(0) == "USA")
val resizeRDD = filterRDD.coalesce(50)
val result    = resizeRDD.collect

我的问题是

  1. concecece(numPartitions)是否会从filterRDD中删除空分区?

  1. Is it true that coalesce(numPartitions) will remove the empty partitions from filterRDD?

coalcece(numPartitions)是否经过改组?

Does coalesce(numPartitions) undergo shuffling or not?

推荐答案

coalesce转换用于减少分区数.如果输出分区的数量小于输入的数量,则应使用coalesce.它可以触发RDD改组,具体取决于默认情况下禁用的改组标志(即false).

The coalesce transformation is used to reduce the number of partitions. coalesce should be used if the number of output partitions is less than the input. It can trigger RDD shuffling depending on the shuffle flag which is disabled by default (i.e. false).

如果分区数大于当前分区数,并且您使用的是不带shuffle = true标志的coalesce方法,则分区数保持不变.coalesce不保证将删除空分区.例如,如果您有20个空分区和10个带数据的分区,则在调用rdd.coalesce(25)之后仍然会有空分区.如果在将shuffle设置为true的情况下使用coalesce,则这将等效于repartition方法,并且数据将均匀分布在各个分区中.

If number of partitions is larger than current number of partitions and you are using coalesce method without shuffle=true flag then number of partitions remains unchanged.coalesce doesn't guarantee that the empty partitions will be removed. For example if you have 20 empty partitions and 10 partitions with data, then there will still be empty partitions after you call rdd.coalesce(25). If you use coalesce with shuffle set to true then this will be equivalent to repartition method and data will be evenly distributed across the partitions.

这篇关于火花中的合并(numPartitions)是否经过改组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆