AggregateByKey分区? [英] AggregateByKey Partitioning?

查看：111 发布时间：2020/6/17 19:21:54 scala apache-spark hadoop-partitioning

本文介绍了AggregateByKey分区?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有:

A_RDD = anRDD.map()

B_RDD = A_RDD.aggregateByKey()

好的，我的问题是:

如果我在A_RDD之后放置partitionBy(new HashPartitioner)，如下所示:

If i put partitionBy(new HashPartitioner) after A_RDD like :

A_RDD = anRDD.map().partitionBy(new HashPartitioner(2))

B_RDD = A_RDD.aggregateByKey()

1)首先，这会和我将其保持原样一样有效率吗? AggregateByKey()将使用A_RDD中的hashPartitioner，对吗?

1)Will this be the same efficient as if i leave it as it is, in the first place? aggregateByKey() will use that hashPartitioner from A_RDD, right?

2)或者，如果我像在第一个示例中那样保留它，则aggregateByKey()将首先按键聚合每个分区，然后以更多的方式发送每个已聚合"(键，值)对正确的分区的有效方法?

2)Or If i leave it as in the first example,aggregateByKey() will aggregate every partition by key first, and then send every "aggregated" (key, value) pair in a more efficient way to the right partition?

3)为什么RDD上的map，flatMap和其他转换不能接受关于如何动态分配(键，值)对的争论? 我的意思是例如在每个元组的map()操作期间，让=>将此元组也发送到特定分区已由地图e.x上的partitioner参数指定的地图:map(，Partitioner).

3)Why doesn't map,flatMap and other transformations on RDDs canNOT take an argument on how to partition the (key, value) pairs on the fly? What I mean is for example during the map() operation on every tuple lets say, => to send also this tuple to a specific partition that has been designated by a partitioner argument on map e.x: map( , Partitioner).

我正在尝试掌握AggregateByKey()的工作原理，但是每当我认为得到这一点时，就会出现一个新问题…… 预先感谢.

I am trying to grasp the concept of aggregateByKey() how it works, but every time i think i got this, a new question arises... Thanks in advance.

AggregateByKey分区? [英] AggregateByKey Partitioning?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

AggregateByKey分区? [英] AggregateByKey Partitioning?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭