Spark:无法将RDD元素添加到闭包内的可变HashMap中 [英] Spark: Cannot add RDD elements into a mutable HashMap inside a closure

查看:344
本文介绍了Spark:无法将RDD元素添加到闭包内的可变HashMap中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码,其中rddMaporg.apache.spark.rdd.RDD[(String, (String, String))],而myHashMapscala.collection.mutable.HashMap.

I have the following code where rddMap is of org.apache.spark.rdd.RDD[(String, (String, String))], and myHashMap is scala.collection.mutable.HashMap.

我做了.saveAsTextFile("temp_out")来强制评估rddMap.map.

但是,即使println(" t " + t)正在打印内容,后来的myHashMap仍然只有一个元素,我手动将其放在("test1", ("10", "20"))的开头. rddMap中的所有内容均未放入myHashMap中.

However, even println(" t " + t) is printing things, later myHashMap still has only one element I manually put in the beginning ("test1", ("10", "20")). Everything in the rddMap is not put into myHashMap.

代码段

val myHashMap = new HashMap[String, (String, String)]
myHashMap.put("test1", ("10", "20"))
rddMap.map { t =>
  println(" t " + t)
  myHashMap.put(t._1, t._2)
}.saveAsTextFile("temp_out")

println(rddMap.count)
println(myHashMap.toString)

为什么我不能将rddMap中的元素放到我的myHashMap中?

Why I cannot put the elements from rddMap to my myHashMap?

推荐答案

以下是您要完成的工作示例.

Here is a working example of what you want to accomplish.

val rddMap = sc.parallelize(Map("A" -> ("v", "v"), "B" -> ("d","d")).toSeq)
// Collects all the data in the RDD and converts the data to a Map
val myMap = rddMap.collect().toMap
myMap.foreach(println)

输出:

(A,(v,v))  
(B,(d,d))

这与您发布的代码相似

rddMap.map { t=> 
  println("t" + t)
  newHashMap.put(t._1, t._2)
  println(newHashMap.toString) 
}.collect

这是Spark shell中上述代码的输出

Here is the output to the above code from the Spark shell

t(A,(v,v))  
Map(A -> (v,v), test1 -> (10,20))  
t(B,(d,d))  
Map(test1 -> (10,20), B -> (d,d))

在我看来,Spark会复制您的HashMap并将元素添加到复制的地图中.

To me it looks like Spark copies your HashMap and does add the element to the copied map.

这篇关于Spark:无法将RDD元素添加到闭包内的可变HashMap中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆