如果我没有足够的内存,将会产生什么火花? [英] What will spark do if I don't have enough memory?

查看:55
本文介绍了如果我没有足够的内存,将会产生什么火花?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新手,我发现文档中说Spark将把数据加载到内存中,以加快迭代算法的速度.

I'm new to Spark, and I found the Documentation says Spark will will load data into memory to make the iteration algorithms faster.

但是,如果我有10GB的日志文件而只有2GB的内存怎么办? Spark是否会像往常一样将日志文件加载到内存中?

But what if I have a log file of 10GB and only have 2GB memory ? Will Spark load the log file into memory as always ?

推荐答案

我认为Spark网站(

I think this question has been well answered in the FAQ panel of Spark website (https://spark.apache.org/faq.html):

  • 如果我的数据集不适合内存怎么办? 通常,每个数据分区很小并且确实适合内存,并且这些分区一次要处理几个.对于无法容纳内存的非常大的分区,Spark的内置运算符对数据集执行外部操作.
  • 当缓存的数据集不适合内存时会发生什么? 每次请求时,Spark要么将其溢出到磁盘上,要么重新计算不适合RAM的分区.默认情况下,它使用重新计算,但是您可以将数据集的存储级别设置为MEMORY_AND_DISK来避免这种情况.
  • What happens if my dataset does not fit in memory? Often each partition of data is small and does fit in memory, and these partitions are processed a few at a time. For very large partitions that do not fit in memory, Spark's built-in operators perform external operations on datasets.
  • What happens when a cached dataset does not fit in memory? Spark can either spill it to disk or recompute the partitions that don't fit in RAM each time they are requested. By default, it uses recomputation, but you can set a dataset's storage level to MEMORY_AND_DISK to avoid this.

这篇关于如果我没有足够的内存,将会产生什么火花?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆