如何检查我的RDD或数据帧是否已缓存? [英] How can I check whether my RDD or dataframe is cached or not?

查看:30
本文介绍了如何检查我的RDD或数据帧是否已缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个数据框,称为df1.我使用df1.cache()对此进行了缓存.如何检查此内容是否已缓存?还有一种方法可以使我看到所有缓存的RDD或数据帧.

I have created a dataframe say df1. I cached this by using df1.cache(). How can I check whether this has been cached or not? Also is there a way so that I am able to see all my cached RDD's or dataframes.

推荐答案

您可以在数据框和RDD上调用 getStorageLevel.useMemory 来确定数据集是否在内存中.

You can call getStorageLevel.useMemory on the Dataframe and the RDD to find out if the dataset is in memory.

对于数据框,请执行以下操作:

For the Dataframe do this:

scala> val df = Seq(1, 2).toDF()
df: org.apache.spark.sql.DataFrame = [value: int]

scala> df.storageLevel.useMemory
res1: Boolean = false

scala> df.cache()
res0: df.type = [value: int]

scala> df.storageLevel.useMemory
res1: Boolean = true

对于RDD,请执行以下操作:

For the RDD do this:

scala> val rdd = sc.parallelize(Seq(1,2))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:21

scala> rdd.getStorageLevel.useMemory
res9: Boolean = false

scala> rdd.cache()
res10: rdd.type = ParallelCollectionRDD[1] at parallelize at <console>:21

scala> rdd.getStorageLevel.useMemory
res11: Boolean = true

这篇关于如何检查我的RDD或数据帧是否已缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆