如何在Spark中使用功能mapPartitionsWithIndex? [英] How to use function mapPartitionsWithIndex in Spark?
问题描述
mapPartitionsWithIndex
具有参数preservesPartitioning
,我不知道如何设置它.
the mapPartitionsWithIndex
has a parameter preservesPartitioning
, I don't know how to set it.
我做了一个测试:
// partitionedRDD's type is RDD[(String, String)]
partitionedRDD.mapPartitionsWithIndex((index, iter) => {
iter.map(_._1)
}, args(2).toBoolean).saveAsTextFile(args(3))
无论我将preservesPartitioning
设置为false
还是true
,RDD分区均未更改.为什么?
whatever I set preservesPartitioning
to false
or true
, the RDD partitions has not been changed. Why?
如果我不想更改分区,我应该为preservesPartitioning
设置什么值?
If I wantn't changed the partitions, what should I set value for preservesPartitioning
?
推荐答案
我认为您对servesPartitioning的含义感到困惑.通过将其设置为true,并不是说Spark'请保留部分',而是告诉它'我有一个保留键的功能,并且RDD是一对RDD'.
I think you are confused by preservesPartitioning meaning. By setting it to true, you are not saying to Spark 'please preserve the partions' you are telling it 'I have a function that preserves keys and the RDD is a pair RDD'.
通过Spark文档:
preservesPartitioning
指示输入函数是否保留分区程序,除非这是一对RDD并且输入函数未修改键,否则应为false
.
preservesPartitioning
indicates whether the input function preserves the partitioner, which should befalse
unless this is a pair RDD and the input function doesn't modify the keys.
在您的情况下,您有一对RDD,并且该函数不会修改密钥,因此该标志应为true.
In your case, you have a pair RDD and the function does not modify the key so the flag should be true.
这篇关于如何在Spark中使用功能mapPartitionsWithIndex?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!