spark中的memory_only和memory_and_disk缓存级别有什么区别? [英] What is the difference between memory_only and memory_and_disk caching level in spark?

查看:1709
本文介绍了spark中的memory_only和memory_and_disk缓存级别有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

spark中的memory_only和memory_and_disk缓存级别的行为有何不同?

How is the behavior of memory_only and memory_and_disk caching level in spark differ?

推荐答案

文档说---

存储级别

含义

MEMORY_ONLY

MEMORY_ONLY

将RDD作为反序列化的Java对象存储在JVM中.如果RDD没有 可以容纳在内存中,某些分区将不会被缓存,并且会被 每次需要时都可以即时重新计算.这是默认值 等级.

Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.

MEMORY_AND_DISK

MEMORY_AND_DISK

将RDD作为反序列化的Java对象存储在JVM中.如果RDD没有 适合内存,存储磁盘上不适合的分区,然后读取 在需要时从那里开始.

Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed.

MEMORY_ONLY_SER

MEMORY_ONLY_SER

将RDD存储为序列化的Java对象(每个分区一个字节数组). 通常,这比反序列化的对象更节省空间, 尤其是在使用快速串行器时,但是要占用更多的CPU资源, 读.

Store RDD as serialized Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer, but more CPU-intensive to read.

MEMORY_AND_DISK_SER

MEMORY_AND_DISK_SER

类似于MEMORY_ONLY_SER,但溢出了不适合的分区 内存到磁盘,而不是每次都在运行中重新计算它们 他们是需要的.

Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.

DISK_ONLY

DISK_ONLY

仅将RDD分区存储在磁盘上.

Store the RDD partitions only on disk.

MEMORY_ONLY_2,MEMORY_AND_DISK_2等

MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.

与以上级别相同,但在两个群集上复制每个分区 节点.

Same as the levels above, but replicate each partition on two cluster nodes.

OFF_HEAP(实验性)

OFF_HEAP (experimental)

以序列化格式将RDD存储在Tachyon中.相比 MEMORY_ONLY_SER,OFF_HEAP减少了垃圾收集的开销,并且 允许执行者更小并共享一个内存池,从而使 在具有大堆或多个并发的环境中具有吸引力 应用程序.此外,由于RDD位于Tachyon,因此 执行程序不会导致丢失内存中的缓存.在这种模式下, Tachyon中的内存是可丢弃的.因此,Tachyon不会尝试 重建从内存中逐出的块.

Store RDD in serialized format in Tachyon. Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors to be smaller and to share a pool of memory, making it attractive in environments with large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon, the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts from memory.

这意味着仅对于内存,spark将尝试始终将分区保留在内存中.如果某些分区无法保留在内存中,或者由于节点丢失而将某些分区从RAM中删除,spark将使用沿袭信息重新计算.在内存和磁盘级别,spark将始终保持分区的计算和缓存.它将尝试保留在RAM中,但是如果不合适,则会将分区溢出到磁盘上.

It means for Memory ONLY, spark will try to keep partitions in memory always. If some partitions can not be kept in memory, or for node loss some partitions are removed from RAM, spark will recompute using lineage information. In memory-and-disk level, spark will always keep partitions computed and cached. It will try to keep in RAM, but if it does not fit then paritions will be spilled to disk.

这篇关于spark中的memory_only和memory_and_disk缓存级别有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆