在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据？ [英] the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead is used to store what kind of data?

查看：3371 发布时间：2016/5/22 16:30:27 apache-spark yarn

本文介绍了在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道的是：

火花塞使用spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead存储什么样的数据？

而在这种情况下，我应该提高spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead的价值？

解决方案

在YARN术语中，执行者和应用程序运行大师里面容器。星火纱线提供特定的属性，所以你可以运行你的应用程序：

spark.yarn.executor.memoryOverhead 是关闭的堆内存的每个执行人要分配量（以兆字节）。这是内存占之类的虚拟机管理费用，实习弦，其他原生费用等，这往往与执行人尺寸（通常6-10％）增长。

spark.yarn.driver.memoryOverhead 是关闭的堆内存的需要每司机群集模式与内存性能执行人的分配量（以兆字节） memoryOverhead。

所以它不是关于存储数据，它只是需要YARN正常运行的资源。

在某些情况下，

如：如果您启用 dynamicAllocation 你可能要与执行人的最大数量以及明确设置这些属性（ spark.dynamicAllocation.maxExecutors ），可以将过程，可以很容易地通过询问数以千计的执行者，从而失去了已经运行的执行人压倒纱过程中创建。

spark.dynamicAllocation.maxExecutors设置为默认无穷如果启用动态分配其设定的上限为执行者的数目。 [参考。 http://spark.apache.org/docs/latest /configuration.html#dynamic-allocation]

按照code文档：[注释<一个href=\"https://github.com/apache/spark/blob/8ef3399aff04bf8b7ab294c0f55bcf195995842b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L43]\" rel=\"nofollow\">https://github.com/apache/spark/blob/8ef3399aff04bf8b7ab294c0f55bcf195995842b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L43]

增加执行人的目标数发生在响应等待调度积压任务。如果调度队列中没有N秒倒掉，然后新的执行者加入。如果队列仍然存在对于另一个M秒钟，然后更执行人加入等。从previous轮每轮增加呈指数级增加至上限的数量已经达到。上限是上述所配置的性能和运行以及未决任务目前一些基于两者。

这可以导成执行人的数目在某些情况下，它可以打破纱资源管理器的指数增长。在我的情况：

16/03/31 7时15分44秒INFO ExecutorAllocationManager：请求8000新遗嘱执行人，因为任务积压（新希望的总将40000）

这并不包括所有可以使用这些属性的用例，但它提供了有关它如何被使用的总体思路。

I wondered that :

spark use the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead to store what kind of data?
And in which case i should boost the value of spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead?

解决方案

In YARN terminology, executors and application masters run inside "containers". Spark offers yarn specific properties so you can run your application :

spark.yarn.executor.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
spark.yarn.driver.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode with the memory properties as the executor's memoryOverhead.

So it's not about storing data, it's just the resources needed for YARN to run properly.

In some cases,

e.g if you enable dynamicAllocation you might want to set these properties explicitly along with the maximum number of executor (spark.dynamicAllocation.maxExecutors) that can be created during the process which can easily overwhelm YARN by asking for thousands of executors and thus loosing the already running executors.

spark.dynamicAllocation.maxExecutors is set to infinity by default which set the upper bound for the number of executors if dynamic allocation is enabled. [ref.http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
According to the code documentation : [ref.https://github.com/apache/spark/blob/8ef3399aff04bf8b7ab294c0f55bcf195995842b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L43]

Increasing the target number of executors happens in response to backlogged tasks waiting to be scheduled. If the scheduler queue is not drained in N seconds, then new executors are added. If the queue persists for another M seconds, then more executors are added and so on. The number added in each round increases exponentially from the previous round until an upper bound has been reached. The upper bound is based both on a configured property and on the current number of running and pending tasks, as described above.

This can lead into an exponential increase of the number of executors in some cases which can break the YARN resource manager. In my case :

16/03/31 07:15:44 INFO ExecutorAllocationManager: Requesting 8000 new executors because tasks are backlogged (new desired total will be 40000)

This doesn't cover all the use case which one can use those property, but it gives a general idea about how it's been used.

这篇关于在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据？ [英] the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead is used to store what kind of data?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据？ [英] the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead is used to store what kind of data?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭