从缓存中删除火花数据帧 [英] Drop spark dataframe from cache

查看：24 发布时间：2021/11/14 21:50:00 apache-spark apache-spark-sql spark-streaming

本文介绍了从缓存中删除火花数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将 Spark 1.3.0 与 python api 一起使用.在转换巨大的数据帧时，我缓存了许多 DF 以加快执行速度；

df1.cache()df2.cache()

一旦某个数据帧的使用结束并且不再需要，我该如何从内存中删除 DF(或取消缓存它??)?

例如，df1 用于整个代码，而 df2 用于少数转换，之后就不再需要了.我想强行删除 df2 以释放更多内存空间.

解决方案

只需执行以下操作:

df1.unpersist()df2.unpersist()

<块引用>

Spark 自动监控每个节点上的缓存使用情况并退出以最近最少使用 (LRU) 的方式对旧数据分区.如果你想手动删除 RDD 而不是等待它掉下来出缓存，使用 RDD.unpersist() 方法.

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;

df1.cache()
df2.cache()

Once use of certain dataframe is over and is no longer needed how can I drop DF from memory (or un-cache it??)?

For example, df1 is used through out the code while df2 is utilized for few transformations and after that, it is never needed. I want to forcefully drop df2 to release more memory space.

解决方案

just do the following:

df1.unpersist()
df2.unpersist()

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

这篇关于从缓存中删除火花数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从缓存中删除火花数据帧 [英] Drop spark dataframe from cache

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从缓存中删除火花数据帧 [英] Drop spark dataframe from cache

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭