为什么要增加spark.yarn.executor.memoryOverhead? [英] Why increase spark.yarn.executor.memoryOverhead?

查看:1129
本文介绍了为什么要增加spark.yarn.executor.memoryOverhead?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加入两个大的spark数据帧,并继续遇到此错误:

I am trying to join two large spark dataframes and keep running into this error:

Container killed by YARN for exceeding memory limits. 24 GB of 22 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

这似乎是spark用户之间的常见问题,但我似乎找不到关于spark.yarn.executor.memoryOverheard是什么的任何可靠描述.在某些情况下,听起来好像是YARN杀死容器之前的一种内存缓冲区(例如,请求了10GB,但YARN直到使用10.2GB时才会杀死容器).在其他情况下,听起来好像它被用来执行某种与我要执行的分析完全分离的数据记帐任务.我的问题是:

This seems like a common issue among spark users, but I can't seem to find any solid descriptions of what spark.yarn.executor.memoryOverheard is. In some cases it sounds like it's a kind of memory buffer before YARN kills the container (e.g. 10GB was requested, but YARN won't kill the container until it uses 10.2GB). In other cases it sounds like it's being used to to do some kind of data accounting tasks that are completely separate from the analysis that I want to perform. My questions are:

  • spark.yarn.executor.memoryOverhead用于什么?
  • 增加这种内存而不是增加内存有什么好处 执行者的内存(或执行者的数量)?
  • 总的来说,我可以采取一些措施来减少自己的体重 spark.yarn.executor.memory开销用法(例如 数据结构,限制数据帧的宽度,使用更少的执行程序和更多的内存等)?
  • What is the spark.yarn.executor.memoryOverhead being using for?
  • What is the benefit of increasing this kind of memory instead of executor memory (or the number of executors)?
  • In general, are there things steps I can take to reduce my spark.yarn.executor.memoryOverhead usage (e.g. particular datastructures, limiting the width of the dataframes, using fewer executors with more memory, etc)?

推荐答案

这是用于解决VM开销,内部字符串,其他本机开销等问题的内存.随着执行程序大小的增加(通常为6%到10%),此内存通常会增加.

This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).

如果您使用一种非JVM来宾语言(Python,R等),则还包括用户对象.

This also includes user objects if you use one of the non-JVM guest languages (Python, R, etc...).

这篇关于为什么要增加spark.yarn.executor.memoryOverhead?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆