从缓存中删除Spark数据帧 [英] Drop spark dataframe from cache
问题描述
我正在将Spark 1.3.0与python api结合使用.在转换巨大的数据帧时,我缓存了许多DF以加快执行速度;
I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;
df1.cache()
df2.cache()
一旦某些数据帧的使用结束并且不再需要了,如何从内存中删除DF(或取消缓存它?)?
Once use of certain dataframe is over and is no longer needed how can I drop DF from memory (or un-cache it??)?
例如,在整个代码中都使用df1
,而将df2
用于少量转换,此后就不再需要了.我想强行删除df2
以释放更多的内存空间.
For example, df1
is used through out the code while df2
is utilized for few transformations and after that, it is never needed. I want to forcefully drop df2
to release more memory space.
推荐答案
只需执行以下操作:
df1.unpersist()
df2.unpersist()
Spark自动监视每个节点上的缓存使用情况并退出 以最近最少使用(LRU)的方式对旧数据分区进行处理.如果你 想要手动删除一个RDD,而不是等待它掉下来 在缓存之外,请使用RDD.unpersist()方法.
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
这篇关于从缓存中删除Spark数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!