Spark列出所有缓存的RDD名称 [英] Spark list all cached RDD names

查看:322
本文介绍了Spark列出所有缓存的RDD名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Apache Spark的新手,我创建了几个RDD和DataFrames,缓存了他们,现在我想使用下面的命令来解开他们中的一些。

  rddName.unpersist()

但我不记得他们的名字。我使用 sc.getPersistentRDDs ,但输出不包括名称。我也使用浏览器来查看缓存的rdds,但是再也没有名字信息。我缺少一些东西吗?

解决方案

@ Dikei的答案其实是正确的,但我相信你要找的是 sc.getPersistentRDDs

  scala> val rdd1 = sc.makeRDD(1到100)
#rdd1:org.apache.spark.rdd.RDD [int] =在...的ParallelRDR [0]在<控制台>:27

scala> val rdd2 = sc.makeRDD(10到1000)
#rdd2:org.apache.spark.rdd.RDD [Int] =在...的ParallelRDR [1],在<控制台>:27

scala> rdd2.cache.setName(rdd_2)
#res0:rdd2.type = rdd_2在<控制台>上的makeRDD的ParallelCollectionRDD [1]:27

scala> sc.getPersistentRDDs
#res1:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] = map(1 - > rdd_2 ParallelCollectionRDD [1]在< console> :27)

scala> rdd1.cache.setName(foo)
#res2:rdd1.type = foo在<控制台>上的makeRDD的ParallelCollectionRDD [0]:27

scala> sc.getPersistentRDDs
#res3:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] =地图(1 - > rdd_2在...的makeRDD上的ParallelCollectionRDD [1]<控制台> :27,0 - > foo在<控制台>上makeRDD处的ParallelCollectionRDD [0]:27)

现在我们再添加一个rdd并命名它:

  scala> rdd3.setName(bar)
#res4:rdd3.type = bar在<控制台>上的makeRDD的ParallelCollectionRDD [2]:27

scala> sc.getPersistentRDDs
#res5:scala.collection.Map [Int,org.apache.spark.rdd.RDD [_]] = map(1 - > rdd_2 ParallelCollectionRDD [1]在< console> :27,0 - > foo在<控制台>上makeRDD处的ParallelCollectionRDD [0]:27)

我们注意到,实际上并没有持续。



我希望这有助于。


I am new to Apache Spark, I created several RDD's and DataFrames, cached them, now I want to unpersist some of them by using the command below

rddName.unpersist()

but I can't remember their names. I used sc.getPersistentRDDs but the output does not include the names. I also used the browser to view the cached rdds but again no name information. Am I missing something?

解决方案

@Dikei's answer is actually correct but I believe what you are looking for is sc.getPersistentRDDs :

scala> val rdd1 = sc.makeRDD(1 to 100)
# rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:27

scala> val rdd2 = sc.makeRDD(10 to 1000)
# rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at makeRDD at <console>:27

scala> rdd2.cache.setName("rdd_2")
# res0: rdd2.type = rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res1: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27)

scala> rdd1.cache.setName("foo")
# res2: rdd1.type = foo ParallelCollectionRDD[0] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res3: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)

Now let's add another rdd and name it as well :

scala> rdd3.setName("bar")
# res4: rdd3.type = bar ParallelCollectionRDD[2] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res5: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)

We noticed that actually it isn't persisted.

I hope this helps.

这篇关于Spark列出所有缓存的RDD名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆