将数据帧缓存在pyspark中 [英] cache a dataframe in pyspark

查看：47 发布时间：2021/4/21 18:35:55 caching pyspark

本文介绍了将数据帧缓存在pyspark中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想更准确地了解pyspark中数据帧的方法缓存的使用

I want to know more precisely about the use of the method cache for dataframe in pyspark

当我运行 df.cache()时，它将返回一个数据帧.因此，如果我执行 df2 = df.cache()，哪个数据帧在缓存中?是 df ， df2 还是两者?

When I run df.cache() it returns a dataframe. Therefore, if I do df2 = df.cache(), which dataframe is in cache ? Is it df, df2, or both ?

推荐答案

我找到了源代码

I found the source code DataFrame.cache

def cache(self):
    """Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`).

    .. note:: The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0.
    """
    self.is_cached = True
    self._jdf.cache()
    return self

因此，答案是:两者

这篇关于将数据帧缓存在pyspark中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将数据帧缓存在pyspark中 [英] cache a dataframe in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将数据帧缓存在pyspark中 [英] cache a dataframe in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭