如何定义星火一个全局变量阶将由所有工人分享? [英] How to define a global scala variable in Spark which will be shared by all workers?

查看:154
本文介绍了如何定义星火一个全局变量阶将由所有工人分享?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在星火计划项目,我想这样定义将由synchrononously所有工作程序访问不可变的映射一个变量,我该怎么办?我应该定义一个斯卡拉对象?

不仅不可变的映射,如果我想可以共享,并可以同步更新变量?例如,可变图,一个无功诠释'或'无功字符串'或者其他人?我该怎么办?是Scala的对象变量OK?例如:

 对象SparkObj {
VAR X:诠释
变种Y:字符串
}


  1. 为x和y由驱动器,而不是工人维护和共享所有
    工人?

  2. 为x和y只有一个拷贝,而不是几个副本?


  3. 是更新X和Y同步?



解决方案

如果你指的是一个变量,在工人运行的闭包内,它将被抓获,系列化和发送给工人。例如:

  VAL I = 5
rdd.map(_ +ⅰ)//i被发送到工人,它们增加5到每个元素。

什么也不从工人但是发回。如果你添加的东西到 mutable.Seq 工人里面,变化不会从任何地方看到。你会修改被执行后关闭了被丢弃的对象。

阿帕奇星火提供了大量的原语执行分布式计算。同步可变状态是不是其中之一。

In Spark program ,I WANT To define a variable like immutable map which will be accessed by all worker programs synchrononously, what can I do ? Should I define an scala object?

Not only immutable map , what if I want a variable that can be shared and can be updated synchronously? For example , a 'mutable map' , a 'var Int' or 'var String' or some others?What can I do? Is an scala object variable OK?For example :

Object SparkObj{
var x:Int
var y:String
}

  1. Is x and y maintained by driver instead of worker and shared by all workers?
  2. Is x and y have only one copy instead of several copies?

  3. Is the update to x and y synchronous?

解决方案

If you refer to a variable inside a closure that runs on the workers, it will be captured, serialized and sent to the workers. For example:

val i = 5
rdd.map(_ + i) // "i" is sent to the workers, they add 5 to each element.

Nothing is sent back from the workers, however. If you add something to a mutable.Seq inside a worker, the change will not be visible from anywhere. You'll be modifying an object that is discarded after the closure is executed.

Apache Spark provides a number of primitives for performing distributed computing. Synchronized mutable state is not one of these.

这篇关于如何定义星火一个全局变量阶将由所有工人分享?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆