修改 Spark RDD foreach 中的集合 [英] Modify collection inside a Spark RDD foreach

查看：47 发布时间：2021/11/12 5:44:29 scala apache-spark rdd

本文介绍了修改 Spark RDD foreach 中的集合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在迭代 RDD 的元素时向地图添加元素.我没有收到任何错误，但没有发生修改.

I'm trying to add elements to a map while iterating the elements of an RDD. I'm not getting any errors, but the modifications are not happening.

直接添加或迭代其他集合一切正常:

It all works fine adding directly or iterating other collections:

scala> val myMap = new collection.mutable.HashMap[String,String]
myMap: scala.collection.mutable.HashMap[String,String] = Map()

scala> myMap("test1")="test1"

scala> myMap
res44: scala.collection.mutable.HashMap[String,String] = Map(test1 -> test1)

scala> List("test2", "test3").foreach(w => myMap(w) = w)

scala> myMap
res46: scala.collection.mutable.HashMap[String,String] = Map(test2 -> test2, test1 -> test1, test3 -> test3)

但是当我尝试从 RDD 执行相同操作时:

But when I try to do the same from an RDD:

scala> val fromFile = sc.textFile("tests.txt")
...
scala> fromFile.take(3)
...
res48: Array[String] = Array(test4, test5, test6)

scala> fromFile.foreach(w => myMap(w) = w)
scala> myMap
res50: scala.collection.mutable.HashMap[String,String] = Map(test2 -> test2, test1 -> test1, test3 -> test3)

我尝试打印地图的内容，就像在 foreach 之前一样，以确保变量相同，并且打印正确:

I've tried printing the contents of the map as it was before the foreach to make sure the variable is the same, and it prints correctly:

fromFile.foreach(w => println(myMap("test1")))
...
test1
test1
test1
...

我还在 foreach 代码中打印了地图的修改元素，它打印为已修改，但是当操作完成时，地图似乎未修改.

I've also printed the modified element of the map inside the foreach code and it prints as modified, but when the operation is completed, the map seems unmodified.

scala> fromFile.foreach({w => myMap(w) = w; println(myMap(w))})
...
test4
test5
test6
...
scala> myMap
res55: scala.collection.mutable.HashMap[String,String] = Map(test2 -> test2, test1 -> test1, test3 -> test3)

将 RDD 转换为数组(收集)也可以正常工作:

Converting the RDD to an array (collect) also works fine:

fromFile.collect.foreach(w => myMap(w) = w)
scala> myMap
res89: scala.collection.mutable.HashMap[String,String] = Map(test2 -> test2, test5 -> test5, test1 -> test1, test4 -> test4, test6 -> test6, test3 -> test3)

这是上下文问题吗?我是否正在访问正在其他地方修改的数据副本?

Is this a context problem? Am I accessing a copy of the data that is being modified somewhere else?

修改 Spark RDD foreach 中的集合 [英] Modify collection inside a Spark RDD foreach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

修改 Spark RDD foreach 中的集合 [英] Modify collection inside a Spark RDD foreach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭