Spark缓存仅保留RDD的一小部分 [英] spark cache only keeps a fraction of RDD

查看：109 发布时间：2020/9/4 7:08:35 caching apache-spark swap

本文介绍了Spark缓存仅保留RDD的一小部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我显式调用rdd.cache时，从Spark Console的存储"选项卡中可以看到，实际上只有rdd的一部分被缓存.我的问题是其余部分在哪里? Spark如何确定要保留在缓存中的部分?

When I explicitly call rdd.cache, I can see from the spark console storage tab that only a fraction of the rdd is actually cached. My question is where are the remaining parts? How does Spark decide which part to leave in cache?

同一问题适用于sc.textFile()读取的初始原始数据.我了解这些rdd会自动缓存，即使Spark Console存储表未显示有关其缓存状态的任何信息.我们知道其中有多少被缓存还是丢失?

The same question applies to the initial raw data read in by sc.textFile(). I understand these rdd's are automatically cached, even though the spark console storage table does not display any information on their cache status. Do we know how much of those are cached vs. missing?

Spark缓存仅保留RDD的一小部分 [英] spark cache only keeps a fraction of RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark缓存仅保留RDD的一小部分 [英] spark cache only keeps a fraction of RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭