如何定义一个全局的读\\写星火变量 [英] How to define a global read\write variables in Spark

查看：180 发布时间：2016/5/22 16:40:23 apache-spark

本文介绍了如何定义一个全局的读\\写星火变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

星火具有广播变量，这些变量是只读的，而累加器变量，它可以是由节点更新，但不读。是否有办法 - 或者一种解决方法 - 定义一个变量，它是既可更新和可阅读

Spark has broadcast variables, which are read only, and accumulator variables, which can be updates by the nodes, but not read. Is there way - or a workaround - to define a variable which is both updatable and can be read?

对于这样一个读一个要求\\写的全局变量将是实现高速缓存。随着文件被加载并作为RDD的处理，计算被执行。这些计算的结果 - 在并行运行几个节点发生 - 需要被放入一个地图，其具有作为其主要的一些实体的属性被处理。随着RDD的内后续的实体处理，缓存查询。

One requirement for such a read\write global variable would be to implement a cache. As files are loaded and processed as rdd's, calculations are performed. The results of these calculations - happening in several nodes running in parallel - need to be placed into a map, which has as it's key some of the attributes of the entity being processed. As subsequent entities within the rdd's are processed, the cache is queried.

Scala并具有 ScalaCache ，这是缓存实现，如谷歌番石榴外观。但是，如何将这样一个高速缓存包括与星火应用程序中访问？

Scala does have ScalaCache, which is a facade for cache implementations such as Google Guava. But how would such a cache be included and accessed within a Spark application?

缓存可以定义为在驱动器应用它创建 SparkContext 的变量。但随后会有两个问题：

The cache could be defined as a variable in the driver application which creates the SparkContext. But then there would be two issues:

性能将presumably因网络开销是坏
节点和驱动器应用程序之间。

要我的理解，每个RDD将通过该变量的副本
（在这种情况下，高速缓存中）当变量首先被访问的
功能传递给RDD。每个RDD将有它自己的副本，而不是访问一个共享的全局变量。

什么是实现和存储这样的高速缓存的最佳方式？

What is the best way to implement and store such a cache?

感谢

如何定义一个全局的读\\写星火变量 [英] How to define a global read\write variables in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何定义一个全局的读\\写星火变量 [英] How to define a global read\write variables in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭