为什么在RDD,地图给NotSerializableException同时的foreach不? [英] Why in an RDD, map gives NotSerializableException while foreach doesn't?

查看:173
本文介绍了为什么在RDD,地图给NotSerializableException同时的foreach不?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我理解地图&放大器之间的根本区别; 的foreach (懒惰和渴望),我也明白为什么这个code段

I understand the basic difference between map & foreach (lazy and eager), also I understand why this code snippet

sc.makeRDD(Seq("a", "b")).map(s => new java.io.ByteArrayInputStream(s.getBytes)).collect

应该给

java.io.NotSerializableException:java.io.ByteArrayInputStream中的

java.io.NotSerializableException: java.io.ByteArrayInputStream

然后我也这么认为应在下述code段

And then I think so should the following code snippet

sc.makeRDD(Seq("a", "b")).foreach(s => {
  val is = new java.io.ByteArrayInputStream(s.getBytes)
  println("is = " + is)
})

但是,这code运行正常。为什么会这样?

But this code runs fine. Why so?

推荐答案

之间其实根本区别地图的foreach 不评估策略。让我们来看看签名(我省略了地图为简洁的隐含部分):

Actually fundamental difference between map and foreach is not evaluation strategy. Lets take a look at the signatures (I've omitted implicit part of map for brevity):

def map[U](f: (T) ⇒ U): RDD[U]
def foreach(f: (T) ⇒ Unit): Unit 

地图需要从 T 的函数 U 它适用于现有的 RDD [T] ,并返回的每个元素 RDD [U] 。为了让操作喜欢洗牌 U 已可序列化。

map takes a function from T to U applies it to each element of the existing RDD[T] and returns RDD[U]. To allow operations likes shuffling U has to be serializable.

的foreach 需要从 T 的函数单位 (这类似于Java的无效),并自行返回任何内容。一切局部发生,有涉及所以没有必要序列没有网络通信。不像地图的foreach 应何时想要得到某种副作用,像<二手href=\"http://stackoverflow.com/questions/31489985/upload-each-element-of-an-rdd-to-a-different-file-in-s3\">your previous问题。

foreach takes a function from T to Unit (which is analogous to Java void) and by itself returns nothing. Everything happens locally, there is no network traffic involved so there is no need for serialization. Unlike map, foreach should be used when want to get some kind of side effect, like in your previous question.

在一个侧面说明这两个实际上是不同的。匿名函数您在使用地图是函数:

On a side note these two are actually different. Anonymous function you use in map is a function:

(s: String) => java.io.ByteArrayInputStream

和一个你在的foreach 使用这样的:

and one you use in foreach like this:

(s: String) => Unit

如果您使用的第二个功能地图您code编译,但结果会从你想要的是远( RDD [单位] )。

If you use the second function with map your code will compile, although result would be far from what you want (RDD[Unit]).

这篇关于为什么在RDD,地图给NotSerializableException同时的foreach不?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆