缓存 RDD 的缺点是什么? [英] What are the drawbacks of caching RDDs?

查看:94
本文介绍了缓存 RDD 的缺点是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们最近开始缓存重复使用多次的 RDD,即使这些 RDD 不需要很长时间来计算.

We recently started caching RDD that reused multiple times even if those RDD don't take a long time to compute.

根据文档,Spark 将使用 LRU 策略自动驱逐未使用的缓存数据.

According to the docs Spark will automatically evict the unused cached data using a LRU strategy.

那么过度缓存 RDD 有什么缺点吗?我在想,也许将所有反序列化的数据放在内存中可能会给 GC 带来更大的压力,但这是我们应该担心的事情吗?

So is there any drawback of overcaching RDDs? I was thinking that maybe that having all that deserialized data in memory could put more pressure on the GC but is this something that we should worry about?

推荐答案

缓存大量 RDD 的主要缺点是(显然)它使用内存.如果缓存的大小有限,LRU 策略并不一定意味着最不值钱的项目被驱逐.如果您在不考虑其价值的情况下缓存所有内容,您可能会发现计算成本更高但不常访问的项目在您不希望它们被驱逐时被驱逐.

The main drawback of caching a large amount of RDDs is (obviously) that it uses memory. If the cache is limíted in size, the LRU strategy doesn't necessarily mean that the least valuable items are evicted. If you are caching everything without regard to its value, you may find that more computationally costly but infrequently accessed items are evicted when you don't want them to be.

这篇关于缓存 RDD 的缺点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆