缓存的RDD的范围 [英] Scope of cached RDDs

查看:75
本文介绍了缓存的RDD的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道缓存的RDD的范围是什么.例如:

I was wondering what is the scope of a cached RDD. For example:

// Cache an RDD.
rdd.cache
// Pass the RDD to a method of another class.
otherClass.calculate(rdd) // This method performs various actions.
// Pass the RDD to a method of the same class.
calculate(rdd)            // This method also performs some actions.
// Perform an action in the same method where the RDD was cached.
rdd.count

在上面的示例中,RDD将实现一次吗? (不需要重新创建吗?)缓存的范围是什么?

In the example above, will the RDD be materialized once? (It won't have to be recreated?) What is the scope of caching?

如果我不再需要它,我应该在使用它后始终取消对其保留吗?

And should I always unpersist the RDD after I used it, if I don't need it anymore?

推荐答案

RDD是否被缓存是RDD对象可变状态的一部分.如果您调用rdd.cache,则此后将标记为要进行缓存.您从哪个范围访问它都没有关系.

Whether an RDD is cached or not is part of the mutable state of the RDD object. If you call rdd.cache it will be marked for caching from then on. It does not matter what scope you access it from.

关于是否应取消保留RDD:如果RDD已被垃圾回收,它将自动取消保留.您可以自行决定是否足够快.高速缓存会占用执行器上的空间,而自动清理是响应 driver 上的内存压力而发生的.

As to whether you should unpersist the RDD: The RDD will be unpersisted automatically if it is garbage collected. It is for you to decide whether this is soon in enough. The cache takes up space on the executors, while the automatic cleanup happens in response to memory pressure on the driver.

这篇关于缓存的RDD的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆