为什么SPARK缓存的RDD溢出到磁盘上? [英] Why SPARK cached RDD spill to disk?

查看：114 发布时间：2021/4/8 20:20:05 scala apache-spark

本文介绍了为什么SPARK缓存的RDD溢出到磁盘上?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有下一个代码，在其中对分区过滤后的输入数据进行分区并保存:

I have next code, where I am repartition filtered input data and persist it:

val df = sparkSession.sqlContext.read
      .parquet(path)
      .as[struct1]
      .filter(dateRange(_,lowerBound,upperBound))
      .repartition(nrInputPartitions)
      .persist()

df.count

我希望所有数据都存储在内存中，但是我在Spark UI中得到了以下内容:

I expect all data to be stored in Memory, but instead I get the following in Spark UI:

存储

Size in Memory   424.2 GB 
Size on Disk     44.1 GB

是因为某些分区没有足够的内存，并且Spark自动切换到 MEMORY_AND_DISK 存储级别?

Is it because some partition didn't have enough Memory, and Spark automatically switched to MEMORY_AND_DISK storage level?

为什么SPARK缓存的RDD溢出到磁盘上? [英] Why SPARK cached RDD spill to disk?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么SPARK缓存的RDD溢出到磁盘上? [英] Why SPARK cached RDD spill to disk?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭