RDD在内存中保留多长时间? [英] How long does RDD remain in memory?

查看:143
本文介绍了RDD在内存中保留多长时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到内存有限,我有种感觉,火花会自动从每个节点中删除RDD.我想知道这次可以配置吗? spark如何确定何时从内存中撤出RDD

Considering memory being limited, I had a feeling that spark automatically removes RDD from each node. I'd like to know is this time configurable? How does spark decide when to evict an RDD from memory

注意:我不是在谈论rdd.cache()

推荐答案

我想知道这次可以配置吗?火花如何决定何时 从内存中驱逐一个RDD

I'd like to know is this time configurable? How does spark decide when to evict an RDD from memory

RDD是与其他对象一样的对象.如果您不保留/缓存它,它将与托管语言下的任何其他对象一起使用,并且在没有活动的根对象指向它时将对其进行收集.

An RDD is an object just like any other. If you don't persist/cache it, it will act as any other object under a managed language would and be collected once there are no alive root objects pointing to it.

@Jacek指出,如何"部分是名为ContextCleaner的对象的责任.主要是,如果您想要详细信息,请这是清洁方法的样子:

The "how" part, as @Jacek points out is the responsibility of an object called ContextCleaner. Mainly, if you want the details, this is what the cleaning method looks like:

private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
  while (!stopped) {
    try {
      val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
          .map(_.asInstanceOf[CleanupTaskWeakReference])
      // Synchronize here to avoid being interrupted on stop()
      synchronized {
        reference.foreach { ref =>
          logDebug("Got cleaning task " + ref.task)
          referenceBuffer.remove(ref)
          ref.task match {
            case CleanRDD(rddId) =>
              doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
            case CleanShuffle(shuffleId) =>
              doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks)
            case CleanBroadcast(broadcastId) =>
              doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
            case CleanAccum(accId) =>
              doCleanupAccum(accId, blocking = blockOnCleanupTasks)
            case CleanCheckpoint(rddId) =>
              doCleanCheckpoint(rddId)
            }
          }
        }
      } catch {
        case ie: InterruptedException if stopped => // ignore
        case e: Exception => logError("Error in cleaning thread", e)
    }
  }
}

如果您想了解更多,我建议浏览Sparks甚至更好的产品,阅读@Jacek的书,名为(这指向有关ContextCleaner )

If you want to learn more, I suggest browsing Sparks source or even better, reading @Jacek book called "Mastering Apache Spark" (This points to an explanation regarding ContextCleaner)

这篇关于RDD在内存中保留多长时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆