RDD在内存中保留多长时间? [英] How long does RDD remain in memory?
问题描述
考虑到内存有限,我有种感觉,火花会自动从每个节点中删除RDD.我想知道这次可以配置吗? spark如何确定何时从内存中撤出RDD
Considering memory being limited, I had a feeling that spark automatically removes RDD from each node. I'd like to know is this time configurable? How does spark decide when to evict an RDD from memory
注意:我不是在谈论rdd.cache()
推荐答案
我想知道这次可以配置吗?火花如何决定何时 从内存中驱逐一个RDD
I'd like to know is this time configurable? How does spark decide when to evict an RDD from memory
RDD
是与其他对象一样的对象.如果您不保留/缓存它,它将与托管语言下的任何其他对象一起使用,并且在没有活动的根对象指向它时将对其进行收集.
An RDD
is an object just like any other. If you don't persist/cache it, it will act as any other object under a managed language would and be collected once there are no alive root objects pointing to it.
@Jacek指出,如何"部分是名为ContextCleaner
的对象的责任.主要是,如果您想要详细信息,请这是清洁方法的样子:
The "how" part, as @Jacek points out is the responsibility of an object called ContextCleaner
. Mainly, if you want the details, this is what the cleaning method looks like:
private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
while (!stopped) {
try {
val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
.map(_.asInstanceOf[CleanupTaskWeakReference])
// Synchronize here to avoid being interrupted on stop()
synchronized {
reference.foreach { ref =>
logDebug("Got cleaning task " + ref.task)
referenceBuffer.remove(ref)
ref.task match {
case CleanRDD(rddId) =>
doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
case CleanShuffle(shuffleId) =>
doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks)
case CleanBroadcast(broadcastId) =>
doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
case CleanAccum(accId) =>
doCleanupAccum(accId, blocking = blockOnCleanupTasks)
case CleanCheckpoint(rddId) =>
doCleanCheckpoint(rddId)
}
}
}
} catch {
case ie: InterruptedException if stopped => // ignore
case e: Exception => logError("Error in cleaning thread", e)
}
}
}
如果您想了解更多,我建议浏览Sparks甚至更好的产品,阅读@Jacek的书,名为(这指向有关ContextCleaner
)
If you want to learn more, I suggest browsing Sparks source or even better, reading @Jacek book called "Mastering Apache Spark" (This points to an explanation regarding ContextCleaner
)
这篇关于RDD在内存中保留多长时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!