如何在Spark中使用功能mapPartitionsWithIndex? [英] How to use function mapPartitionsWithIndex in Spark?

查看：457 发布时间：2020/9/4 5:19:21 apache-spark

本文介绍了如何在Spark中使用功能mapPartitionsWithIndex?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

mapPartitionsWithIndex具有参数preservesPartitioning，我不知道如何设置它.

the mapPartitionsWithIndex has a parameter preservesPartitioning, I don't know how to set it.

我做了一个测试:

// partitionedRDD's type is RDD[(String, String)]
partitionedRDD.mapPartitionsWithIndex((index, iter) => {
                iter.map(_._1)
            }, args(2).toBoolean).saveAsTextFile(args(3))

无论我将preservesPartitioning设置为false还是true，RDD分区均未更改.为什么?

whatever I set preservesPartitioning to false or true, the RDD partitions has not been changed. Why?

如果我不想更改分区，我应该为preservesPartitioning设置什么值?

If I wantn't changed the partitions, what should I set value for preservesPartitioning?

推荐答案

我认为您对servesPartitioning的含义感到困惑.通过将其设置为true，并不是说Spark'请保留部分'，而是告诉它'我有一个保留键的功能，并且RDD是一对RDD'.

I think you are confused by preservesPartitioning meaning. By setting it to true, you are not saying to Spark 'please preserve the partions' you are telling it 'I have a function that preserves keys and the RDD is a pair RDD'.

通过Spark文档:

preservesPartitioning指示输入函数是否保留分区程序，除非这是一对RDD并且输入函数未修改键，否则应为false.

preservesPartitioning indicates whether the input function preserves the partitioner, which should be false unless this is a pair RDD and the input function doesn't modify the keys.

在您的情况下，您有一对RDD，并且该函数不会修改密钥，因此该标志应为true.

In your case, you have a pair RDD and the function does not modify the key so the flag should be true.

这篇关于如何在Spark中使用功能mapPartitionsWithIndex?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Spark中使用功能mapPartitionsWithIndex? [英] How to use function mapPartitionsWithIndex in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Spark中使用功能mapPartitionsWithIndex? [英] How to use function mapPartitionsWithIndex in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭