如果我没有足够的内存,spark 会做什么? [英] What will spark do if I don't have enough memory?

查看:34
本文介绍了如果我没有足够的内存,spark 会做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark 的新手,我发现文档说 Spark 会将数据加载到内存中以加快迭代算法.

I'm new to Spark, and I found the Documentation says Spark will will load data into memory to make the iteration algorithms faster.

但是如果我有一个 10GB 的日志文件并且只有 2GB 内存怎么办?Spark 会像往常一样将日志文件加载到内存中吗?

But what if I have a log file of 10GB and only have 2GB memory ? Will Spark load the log file into memory as always ?

推荐答案

我觉得这个问题在 Spark 网站的 FAQ panel (https://spark.apache.org/faq.html):

I think this question has been well answered in the FAQ panel of Spark website (https://spark.apache.org/faq.html):

  • 如果我的数据集不适合内存会怎样?通常,每个数据分区都很小并且确实适合内存,并且这些分区一次处理几个.对于不适合内存的非常大的分区,Spark 的内置运算符会对数据集执行外部操作.
  • 当缓存数据集不适合内存时会发生什么?Spark 可以将其溢出到磁盘,也可以在每次请求时重新计算不适合 RAM 的分区.默认情况下,它使用重新计算,但您可以将数据集的存储级别设置为 MEMORY_AND_DISK 以避免这种情况.
  • What happens if my dataset does not fit in memory? Often each partition of data is small and does fit in memory, and these partitions are processed a few at a time. For very large partitions that do not fit in memory, Spark's built-in operators perform external operations on datasets.
  • What happens when a cached dataset does not fit in memory? Spark can either spill it to disk or recompute the partitions that don't fit in RAM each time they are requested. By default, it uses recomputation, but you can set a dataset's storage level to MEMORY_AND_DISK to avoid this.

这篇关于如果我没有足够的内存,spark 会做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆