谁能解释执行程序中的rdd块 [英] Can anyone explain about rdd blocks in executors

查看:78
本文介绍了谁能解释执行程序中的rdd块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人能解释为什么我第二次运行spark代码时rdd块为什么会增加,即使它们在第一次运行时存储在spark存储器中.我使用线程进行输入.rdd块的确切含义是什么. >

Can anyone explain why rdd blocks are increasing when i am running the spark code second time even though they are stored in spark memory during first run.I am giving input using thread.what is the exact meaning of rdd blocks.

推荐答案

我今天一直在对此进行研究,似乎RDD块是RDD块和非RDD块的总和. 在以下位置查看代码: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala

I have been researching about this today and it seems RDD blocks is the sum of RDD blocks and non-RDD blocks. Check out the code at: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala

 val rddBlocks = status.numBlocks

如果您转到Github上的Apache Spark Repo的以下链接,请执行以下操作: https://github.com/apache/spark/blob/d5b1d5fc80153571c308130833d0c0774de62c92/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala

And if you go to the below link of Apache Spark Repo on Github: https://github.com/apache/spark/blob/d5b1d5fc80153571c308130833d0c0774de62c92/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala

您将找到以下代码行:

      /**
   * Return the number of blocks stored in this block manager in O(RDDs) time.
   *
   * @note This is much faster than `this.blocks.size`, which is O(blocks) time.
   */
  def numBlocks: Int = _nonRddBlocks.size + numRddBlocks

非rdd块是由广播变量创建的块,因为它们作为缓存的块存储在内存中.驱动程序通过广播变量将任务发送给执行者. 现在,这些系统创建的广播变量已通过ContextCleaner服务删除,因此,相应的非RDD块也被删除. 通过rdd.unpersist()可以不保留RDD块.

Non-rdd blocks are the ones created by broadcast variables as they are stored as cached blocks in memory. The tasks are sent by driver to the executors through broadcast variables. Now these system created broadcast variables are deleted through ContextCleaner service and consequently the corresponding non-RDD block is removed. RDD blocks are unpersisted through rdd.unpersist().

这篇关于谁能解释执行程序中的rdd块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆