如果我在星火缓存相同RDD两次会发生什么 [英] What happens if I cache the same RDD twice in Spark

查看：221 发布时间：2016/5/22 16:30:39 java caching apache-spark rdd

本文介绍了如果我在星火缓存相同RDD两次会发生什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要建一个接收RDD并做一些计算上的通用函数。由于我运行输入RDD一个以上的计算，我想缓存它。例如：

I'm building a generic function which receives a RDD and does some calculations on it. Since I run more than one calculation on the input RDD I would like to cache it. For example:

public JavaRDD<String> foo(JavaRDD<String> r) {
    r.cache();
    JavaRDD t1 = r... //Some calculations
    JavaRDD t2 = r... //Other calculations
    return t1.union(t2);
}

我的问题是，因为研究是给我可能会或可能不会已经被缓存。如果缓存，我再次呼吁缓存它，会引发创建缓存意味着一个新层，虽然 T1 和 T2 的计算，我将有研究的两个实例在缓存中？或将引发知道的事实，即研究被缓存，并会忽略它？

My question is, since r is given to me it may or may not already be cached. If it is cached and I call cache on it again, will spark create a new layer of cache meaning that while t1 and t2 are calculated I will have two instances of r in the cache? or will spark is aware of the fact that r is cached and will ignore it?

如果我在星火缓存相同RDD两次会发生什么 [英] What happens if I cache the same RDD twice in Spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如果我在星火缓存相同RDD两次会发生什么 [英] What happens if I cache the same RDD twice in Spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭