更新RDD元素的内部状态 [英] Update the internal state of RDD elements

查看：65 发布时间：2021/4/8 20:21:41 apache-spark rdd

本文介绍了更新RDD元素的内部状态的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Spark的新手，我想使用 rdd.foreach 方法更新RDD元素的内部状态，但是它不起作用.这是我的代码示例:

I'm newbie in Spark and I want to update the internal state of my RDD's elements with rdd.foreach method, but it doesn't work. Here is my code example:

class Test extends Serializable{
  var foo = 0.0
  var bar = 0.0

  def updateFooBar() = {
    foo = Math.random()
    bar = Math.random()
  }
}

var testList = Array.fill(5)(new Test())
var testRDD = sc.parallelize(testList)
testRDD.foreach{ x => x.updateFooBar() }
testRDD.collect().foreach { x=> println(x.foo+"~"+x.bar) }

结果是:

0.0~0.0
0.0~0.0
0.0~0.0
0.0~0.0
0.0~0.0

推荐答案

RDD在设计上是不可变的.这种设计选择使它们更加健壮，因为变异是漏洞的常见来源，并且它支持RDD名称的弹性"部分(弹性分布式数据集).如果下游RDD中的分区丢失，Spark可以从其父级重建它.因此，最好将Spark编程视为数据流的构造，即使您不执行流式传输.

RDDs are immutable by design. This design choice makes them more robust, as mutation is a common source of bugs, and it supports the "resilient" part of the RDD name (resilient distributed dataset); if a partition in a downstream RDD is lost, Spark can reconstruct it from its parents. So, it's best to think of Spark programming as construction of dataflows, even when you're not doing streaming.

在 foreach 上，它是为纯粹的副作用"操作而设计的，例如写入磁盘，数据库或控制台.

On foreach, it's designed for "pure side effect" operations, like writing to disk, a database, or the console.

这篇关于更新RDD元素的内部状态的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更新RDD元素的内部状态 [英] Update the internal state of RDD elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

更新RDD元素的内部状态 [英] Update the internal state of RDD elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭