mapPartitions返回空数组 [英] mapPartitions returns empty array

查看:396
本文介绍了mapPartitions返回空数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下RDD其中有4个分区: -

I have the following RDD which has 4 partitions:-

val rdd=sc.parallelize(1 to 20,4)

现在我尝试调用这个mapPartitions: -

Now I try to call mapPartitions on this:-

scala> rdd.mapPartitions(x=> { println(x.size); x }).collect
5
5
5
5
res98: Array[Int] = Array()

为什么它返回空数组?该anonymoys功能简单地返回它收到了同样的迭代器,那么它是如何返回空数组?有趣的是,如果我删除println语句,,它确实返回非空数组: -

Why does it return empty array? The anonymoys function is simply returning the same iterator it received, then how is it returning empty array? The interesting part is that if I remove println statement, it indeed returns non empty array:-

scala> rdd.mapPartitions(x=> { x }).collect
res101: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)

这我不明白。怎么来的println的presence(这是简单的打印迭代器大小)影响功能的结局?

This I don't understand. How come the presence of println (which is simply printing size of iterator) affecting the final outcome of the function?

推荐答案

这是因为 X TraversableOnce ,这意味着,通过调用尺寸走过它,然后返回回....空的。

That's because x is a TraversableOnce, which means that you traversed it by calling size and then returned it back....empty.

您可以解决它通过多种方式,但在这里是一种:

You could work around it a number of ways, but here is one:

rdd.mapPartitions(x=> {
  val list = x.toList;
  println(list.size);
  list.toIterator
}).collect

这篇关于mapPartitions返回空数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆