如何在Sparklyr中持久? [英] How to unpersist in Sparklyr?

查看：129 发布时间：2020/5/26 19:41:12 r apache-spark persistence sparklyr

本文介绍了如何在Sparklyr中持久?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在项目中使用Sparklyr，并且了解持久性非常有用.为此，我正在使用sdf_persist并使用以下语法(如果我输入错了，请纠正我):

I am using Sparklyr for a project and have understood that persisting is very useful. I am using sdf_persist for this, with the following syntax (correct me if I am wrong):

data_frame <- sdf_persist(data_frame)

现在我已经达到了在内存中存储太多RDD的地步，因此我需要取消持久一些RDD.但是我似乎找不到在Sparklyr中执行此操作的函数.请注意，我已经尝试过:

Now I am reaching a point where I have too many RDDs stored in memory, so I need to unpersist some. However I cannot seem to find the function to do this in Sparklyr. Note that I have tried:

dplyr::db_drop_table(sc, "data_frame")
dplyr::db_drop_table(sc, data_frame)
unpersist(data_frame)
sdf_unpersist(data_frame)

但是这些都不起作用.

此外，我试图避免使用tbl_cache(在这种情况下，似乎db_drop_table有效)，因为sdf_persist似乎在存储级别提供了更多的自由.可能是我在这里错过了如何使用持久性的全景图，在这种情况下，我很乐意学习更多.

Also, I am trying to avoid using tbl_cache (in which case it seems that db_drop_table works) as it seems that sdf_persist offers more liberty on the storage level. It might be that I am missing the big picture of how to use persistence here, in which case, I'll be happy to learn more.

推荐答案

如果您不关心粒度，那么最简单的解决方案是调用Catalog.clearCache:

If you don't care about granularity then the simplest solution is to invoke Catalog.clearCache:

spark_session(sc) %>% invoke("catalog") %>% invoke("clearCache")

由于sparklyr间接寻址，取消缓存特定对象的过程要简单得多.如果检查sdf_cache返回的对象，您会发现持久化表没有直接公开:

Uncaching specific object is much less straightforward due to sparklyr indirection. If you check the object returned by sdf_cache you'll see that the persisted table is not exposed directly:

df <- copy_to(sc, iris, memory=FALSE, overwrite=TRUE) %>% sdf_persist()

spark_dataframe(df) %>% 
  invoke("storageLevel") %>% 
  invoke("equals", invoke_static(sc, "org.apache.spark.storage.StorageLevel", "NONE"))

[1] TRUE

这是因为您没有直接获得注册表，而是像SELECT * FROM ...这样的子查询的结果.

That's beacuase you don't get registered table directly, but rather a result of subquery like SELECT * FROM ....

这意味着您不能简单地调用unpersist:

It means you cannot simply call unpersist:

~~spark_dataframe(df) %>% invoke("unpersist")~~

就像在官方API之一中一样.

as you would in one of the official API's.

相反，您可以尝试检索源表的名称，例如这样

Instead you can try to retrieve the name of the source table, for example like this

src_name <- as.character(df$ops$x)

，然后调用Catalog.uncacheTable:

spark_session(sc) %>% invoke("catalog") %>% invoke("uncacheTable", src_name)

这可能不是最可靠的解决方案，因此请谨慎使用.

That is likely not the most robust solution, so please use with caution.

这篇关于如何在Sparklyr中持久?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Sparklyr中持久? [英] How to unpersist in Sparklyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Sparklyr中持久? [英] How to unpersist in Sparklyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭