在是否火花flatMap导致洗牌? [英] Does a flatMap in spark cause a shuffle?

查看:288
本文介绍了在是否火花flatMap导致洗牌?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请问flatMap火花表现得像地图功能,因此不会引起洗牌呢,还是引发洗牌。我怀疑它会造成洗牌。有人可以证实吗?

Does flatMap in spark behave like the map function and therefore cause no shuffling, or does it trigger a shuffle. I suspect it does cause shuffling. Can someone confirm it?

推荐答案

有没有与任何地图或flatMap洗牌。导致洗牌的操作是:

There is no shuffling with either map or flatMap. The operations that cause shuffle are:


  • 重新分区操作:

    • 重新分区:

    • 合并:


    • GroupByKey:

    • ReduceByKey:


    • 协同组:

    • 加入

    虽然集新混洗数据的每个分区的元素将是确定的,所以是分区本身的顺序,这些元素的顺序是没有的。如果希望predictably以下洗牌有序的数据则有可能是:

    Although the set of elements in each partition of newly shuffled data will be deterministic, and so is the ordering of partitions themselves, the ordering of these elements is not. If one desires predictably ordered data following shuffle then it’s possible to use:


    • mapPartitions 的使用,例如每个分区进行排序, .sorted

    • repartitionAndSortWithinPartitions 的有效,同时重新划分分区排序

    • sortBy 的做一个全局排序RDD

    • mapPartitions to sort each partition using, for example, .sorted
    • repartitionAndSortWithinPartitions to efficiently sort partitions while simultaneously repartitioning
    • sortBy to make a globally ordered RDD

    此处了解详情: HTTP://spark.apache。组织/文档/最新/编程-guide.html#洗牌的操作

    这篇关于在是否火花flatMap导致洗牌?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆