触发堆内存配置和钨 [英] spark off heap memory config and tungsten

查看:22
本文介绍了触发堆内存配置和钨的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为随着项目 Tungesten 的集成,spark 会自动使用堆外内存.

I thought that with the integration of project Tungesten, spark would automatically use off heap memory.

spark.memory.offheap.size 和 spark.memory.offheap.enabled 有什么用?我是否需要在这里手动指定 Tungsten 的堆外内存量?

What for are spark.memory.offheap.size and spark.memory.offheap.enabled? Do I manually need to specify the amount of off heap memory for Tungsten here?

推荐答案

Spark/Tungsten 使用编码器/解码器将 JVM 对象表示为高度专业化的 Spark SQL 类型对象,然后可以以高性能的方式对其进行序列化和操作.内部格式表示高效且对GC内存利用友好.

Spark/Tungsten use Encoders/Decoders to represent JVM objects as a highly specialized Spark SQL Types objects which then can be serialized and operated on in a highly performant way. Internal format representation is highly efficient and friendly to GC memory utilization.

因此,即使在默认的堆上模式下运行,Tungsten 也减轻了 JVM 对象内存布局和 GC 运行时间的巨大开销.在这种模式下,Tungsten 确实在堆上分配对象用于其内部目的,分配内存块可能很大,但它发生的频率要低得多,并且可以顺利地在 GC 生成转换中幸存下来.这几乎消除了考虑将这个内部结构移出堆的需要.

Thus, even operating in the default on-heap mode Tungsten alleviates the great overhead of JVM objects memory layout and the GC operating time. Tungsten in that mode does allocate objects on heap for its internal purposes and the allocation memory chunks might be huge but it happens much less frequently and survives GC generation transitions smoothly. This almost eliminates the need to consider moving this internal structure off-heap.

在我们开启和关闭此模式的实验中,我们没有看到运行时间的显着改进.但是,在堆外模式下,您需要仔细设计 JVM 进程之外的内存分配.当您需要允许和规划除 JVM 进程配置之外的额外内存块时,这可能会给 YARN、Mesos 等容器管理器带来一些困难.

In our experiments with this mode on and off we did not see a considerable run time improvements. But what you get with off-heap mode on is that one need to carefully design for the memory allocation outside of you JVM process. This might impose some difficulties within container managers like YARN, Mesos etc when you will need to allow and plan for additional memory chunks besides your JVM process configuration.

也在堆外模式下,Tungsten 使用 sun.misc.Unsafe 这在您的部署场景中可能不是您想要的,甚至是不可能的(例如,使用限制性的 Java 安全管理器配置).

Also in off-heap mode Tungsten uses sun.misc.Unsafe which might not be a desired or even possible in your deployment scenarios (with restrictive java security manager configuration for example).

我还分享了一个带有时间标签的视频会议 谈话,当他被问到时,Josh Rosen类似的问题.

I am also sharing a time tagged video conference talk from Josh Rosen when he is being asked the similar question.

这篇关于触发堆内存配置和钨的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆