Spark:无法将RDD元素添加到闭包内的可变HashMap中 [英] Spark: Cannot add RDD elements into a mutable HashMap inside a closure
问题描述
我有以下代码,其中rddMap
是org.apache.spark.rdd.RDD[(String, (String, String))]
,而myHashMap
是scala.collection.mutable.HashMap
.
I have the following code where rddMap
is of org.apache.spark.rdd.RDD[(String, (String, String))]
, and myHashMap
is scala.collection.mutable.HashMap
.
我做了.saveAsTextFile("temp_out")
来强制评估rddMap.map
.
但是,即使println(" t " + t)
正在打印内容,后来的myHashMap
仍然只有一个元素,我手动将其放在("test1", ("10", "20"))
的开头.
rddMap
中的所有内容均未放入myHashMap
中.
However, even println(" t " + t)
is printing things, later myHashMap
still has only one element I manually put in the beginning ("test1", ("10", "20"))
.
Everything in the rddMap
is not put into myHashMap
.
代码段
val myHashMap = new HashMap[String, (String, String)]
myHashMap.put("test1", ("10", "20"))
rddMap.map { t =>
println(" t " + t)
myHashMap.put(t._1, t._2)
}.saveAsTextFile("temp_out")
println(rddMap.count)
println(myHashMap.toString)
为什么我不能将rddMap中的元素放到我的myHashMap
中?
Why I cannot put the elements from rddMap to my myHashMap
?
推荐答案
以下是您要完成的工作示例.
Here is a working example of what you want to accomplish.
val rddMap = sc.parallelize(Map("A" -> ("v", "v"), "B" -> ("d","d")).toSeq)
// Collects all the data in the RDD and converts the data to a Map
val myMap = rddMap.collect().toMap
myMap.foreach(println)
输出:
(A,(v,v))
(B,(d,d))
这与您发布的代码相似
rddMap.map { t=>
println("t" + t)
newHashMap.put(t._1, t._2)
println(newHashMap.toString)
}.collect
这是Spark shell中上述代码的输出
Here is the output to the above code from the Spark shell
t(A,(v,v))
Map(A -> (v,v), test1 -> (10,20))
t(B,(d,d))
Map(test1 -> (10,20), B -> (d,d))
在我看来,Spark会复制您的HashMap并将元素添加到复制的地图中.
To me it looks like Spark copies your HashMap and does add the element to the copied map.
这篇关于Spark:无法将RDD元素添加到闭包内的可变HashMap中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!