Spark列出所有缓存的RDD名称,并且不持久 [英] Spark list all cached RDD names and unpersist

查看:170
本文介绍了Spark列出所有缓存的RDD名称,并且不持久的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Apache Spark的新手,我创建了几个RDD和DataFrame,并对其进行了缓存,现在我想使用下面的命令来取消持久化其中一些

I am new to Apache Spark, I created several RDD's and DataFrames, cached them, now I want to unpersist some of them by using the command below

rddName.unpersist()

但是我不记得他们的名字了.我使用了sc.getPersistentRDDs,但是输出不包含名称.我还使用浏览器查看了缓存的rdds,但再次没有名称信息.我想念什么吗?

but I can't remember their names. I used sc.getPersistentRDDs but the output does not include the names. I also used the browser to view the cached rdds but again no name information. Am I missing something?

推荐答案

@Dikei的答案实际上是正确的,但我相信您正在寻找的是sc.getPersistentRDDs:

@Dikei's answer is actually correct but I believe what you are looking for is sc.getPersistentRDDs :

scala> val rdd1 = sc.makeRDD(1 to 100)
# rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:27

scala> val rdd2 = sc.makeRDD(10 to 1000)
# rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at makeRDD at <console>:27

scala> rdd2.cache.setName("rdd_2")
# res0: rdd2.type = rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res1: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27)

scala> rdd1.cache.setName("foo")
# res2: rdd1.type = foo ParallelCollectionRDD[0] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res3: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)

现在让我们添加另一个RDD并为其命名:

Now let's add another RDD and name it as well :

scala> rdd3.setName("bar")
# res4: rdd3.type = bar ParallelCollectionRDD[2] at makeRDD at <console>:27

scala> sc.getPersistentRDDs
# res5: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)

我们注意到它实际上并没有持久存在.

We noticed that actually it isn't persisted.

这篇关于Spark列出所有缓存的RDD名称,并且不持久的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆