LinkedHashMap的变量是不可访问,出边的foreach循环 [英] LinkedHashMap variable is not accessable out side the foreach loop
问题描述
下面是我的code。
var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.map(s => (s(8), s(9))))
fieldTypeMapRDD.foreach { i =>
println(i)
link.put(i._1, i._2)
}
println(link.size)// here size is zero
我要访问的链接出侧循环。请帮助。
I want to access link out side loop .Please help.
推荐答案
为什么你的code是不应该工作:
Why your code is not supposed to work:
- 您
的foreach
任务开始之前,整个你的函数的的foreach
块内关闭被序列化,并首先发送到主,然后给每个工人。这意味着,他们每个人都会有自己的mutable.LinkedHashMap
的实例作为链接
的副本。 - 在
的foreach
块每个工人都会把它的每一个项目自身的内部链接
复制 - 后,你的任务已经完成,你必须仍然是空的地方
链接
并在每个工作节点的几个非空的前拷贝。
- Before your
foreach
task is started, whole your function's closure insideforeach
block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance ofmutable.LinkedHashMap
as copy oflink
. - During
foreach
block each worker will put each of its items inside its ownlink
copy - After your task is done you have still empty local
link
and several non-empty former copies on each of worker nodes.
道德是明确的:不RDD使用本地可变集合。它只是没有去上班。
Moral is clear: don't use local mutable collections with RDD. It's just not going to work.
让整个集合到本地机器上的一个方法是收集
方法。
你可以使用它作为:
One way to get whole collection to local machine is collect
method.
You can use it as:
val link = fieldTypeMapRDD.collect.toMap
或需要preserve秩序的情况:
or in case of need to preserve the order:
import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)
但如果你真的到可变
的集合,你可以修改code位。只要修改
But if you are really into mutable
collections, you can modify your code a bit. Just change
fieldTypeMapRDD.foreach {
到
fieldTypeMapRDD.toLocalIterator.foreach {
又见<一个href=\"http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine\">this问题。
这篇关于LinkedHashMap的变量是不可访问,出边的foreach循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!