了解星火的缓存 [英] Understanding Spark's caching

查看：134 发布时间：2016/5/22 15:49:40 apache-spark

本文介绍了了解星火的缓存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想了解星火的缓存工作。

I'm trying to understand how Spark's cache work.

下面是我的幼稚理解，请让我知道，如果我失去了一些东西：

Here is my naive understanding, please let me know if I'm missing something:

val rdd1 = sc.textFile("some data")
rdd.cache() //marks rdd as cached
val rdd2 = rdd1.filter(...)
val rdd3 = rdd1.map(...)
rdd2.saveAsTextFile("...")
rdd3.saveAsTextFile("...")

在上面，RDD1集将从磁盘（例如HDFS）被加载一次。（当RDD2保存我假设），然后再从缓存中（假设有足够的RAM）时rdd3保存）

In the above, rdd1 will be loaded from disk (e.g. HDFS) only once. (when rdd2 is saved I assume) and then from cache (assuming there is enough RAM) when rdd3 is saved)

现在这里是我的问题。比方说，我想缓存RDD2和rdd3，因为他们都将在以后使用，但我在创建后，他们不需要RDD1集。

Now here is my question. Let's say I want to cache rdd2 and rdd3 as they will both be used later on, but I don't need rdd1 after creating them.

基本上没有重复，不是吗？因为一旦RDD2和rdd3计算，我不需要再RDD1集，我也许应该unpersist，对吗？问题是什么时候？

Basically there is duplication, isn't it? Since once rdd2 and rdd3 are calculated, I don't need rdd1 anymore, I should probably unpersist it, right? the question is when?

将这项工作？（选项A）

val rdd1 = sc.textFile("some data")
rdd.cache() //marks rdd as cached
val rdd2 = rdd1.filter(...)
val rdd3 = rdd1.map(...)
rdd2.cache()
rdd3.cache()
rdd1.unpersist()

火花是否添加到DAG的unpersist电话吗？抑或是立即执行？如果它立即执行，那么基本上RDD1集将不缓存，当我从RDD2和rdd3阅读，对不对？

Does spark add the unpersist call to the DAG? or is it done immediately? if it's done immediately, then basically rdd1 will be non cached when I read from rdd2 and rdd3, right?

我应该做这种方式，而不是（选项B）？

val rdd1 = sc.textFile("some data")
rdd.cache() //marks rdd as cached
val rdd2 = rdd1.filter(...)
val rdd3 = rdd1.map(...)

rdd2.cache()
rdd3.cache()

rdd2.saveAsTextFile("...")
rdd3.saveAsTextFile("...")

rdd1.unpersist()

所以，问题是这样的：
是选择A不够好？例如将RDD1集是仍在访问该文件只有一次？
或者，我需要去选择B？

So the question is this: Is Option A good enough? e.g. will rdd1 be still accessing the file only once? Or do I need to go with Option B?

了解星火的缓存 [英] Understanding Spark's caching

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

了解星火的缓存 [英] Understanding Spark&#39;s caching

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

了解星火的缓存 [英] Understanding Spark's caching

登录关闭