为什么在RDD,地图给NotSerializableException同时的foreach不? [英] Why in an RDD, map gives NotSerializableException while foreach doesn't?
问题描述
我理解地图
&放大器之间的根本区别; 的foreach
(懒惰和渴望),我也明白为什么这个code段
I understand the basic difference between map
& foreach
(lazy and eager), also I understand why this code snippet
sc.makeRDD(Seq("a", "b")).map(s => new java.io.ByteArrayInputStream(s.getBytes)).collect
应该给
java.io.NotSerializableException:java.io.ByteArrayInputStream中的
java.io.NotSerializableException: java.io.ByteArrayInputStream
然后我也这么认为应在下述code段
And then I think so should the following code snippet
sc.makeRDD(Seq("a", "b")).foreach(s => {
val is = new java.io.ByteArrayInputStream(s.getBytes)
println("is = " + is)
})
但是,这code运行正常。为什么会这样?
But this code runs fine. Why so?
推荐答案
之间其实根本区别地图
和的foreach
不评估策略。让我们来看看签名(我省略了地图
为简洁的隐含部分):
Actually fundamental difference between map
and foreach
is not evaluation strategy. Lets take a look at the signatures (I've omitted implicit part of map
for brevity):
def map[U](f: (T) ⇒ U): RDD[U]
def foreach(f: (T) ⇒ Unit): Unit
地图
需要从 T
的函数 U
它适用于现有的 RDD [T]
,并返回的每个元素 RDD [U]
。为了让操作喜欢洗牌 U
已可序列化。
map
takes a function from T
to U
applies it to each element of the existing RDD[T]
and returns RDD[U]
. To allow operations likes shuffling U
has to be serializable.
的foreach
需要从 T
的函数单位
(这类似于Java的无效
),并自行返回任何内容。一切局部发生,有涉及所以没有必要序列没有网络通信。不像地图
,的foreach
应何时想要得到某种副作用,像<二手href=\"http://stackoverflow.com/questions/31489985/upload-each-element-of-an-rdd-to-a-different-file-in-s3\">your previous问题。
foreach
takes a function from T
to Unit
(which is analogous to Java void
) and by itself returns nothing. Everything happens locally, there is no network traffic involved so there is no need for serialization. Unlike map
, foreach
should be used when want to get some kind of side effect, like in your previous question.
在一个侧面说明这两个实际上是不同的。匿名函数您在使用地图
是函数:
On a side note these two are actually different. Anonymous function you use in map
is a function:
(s: String) => java.io.ByteArrayInputStream
和一个你在的foreach
使用这样的:
and one you use in foreach
like this:
(s: String) => Unit
如果您使用的第二个功能地图
您code编译,但结果会从你想要的是远( RDD [单位]
)。
If you use the second function with map
your code will compile, although result would be far from what you want (RDD[Unit]
).
这篇关于为什么在RDD,地图给NotSerializableException同时的foreach不?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!