Spark 和 Yarn 的资源分配 [英] Resource Allocation with Spark and Yarn

查看:36
本文介绍了Spark 和 Yarn 的资源分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在纱线客户端模式下使用 Zeppelin 0.7.3 和 Spark 2.3.我的设置是:

I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are:

火花:

spark.driver.memory             4096m
spark.driver.memoryOverhead         3072m
spark.executor.memory           4096m
spark.executor.memoryOverhead           3072m
spark.executor.cores                3
spark.executor.instances            3

纱线:

Minimum allocation: memory:1024, vCores:2
Maximum allocation: memory:9216, vCores:6
The application started by Zeppelin gets the following resources:
Running Containers      4
Allocated CPU VCores        4
Allocated Memory MB 22528

  1. 我不太明白yarn分配的内存量.鉴于设置,我假设纱线会保留 (4096+3072)*4m = 28672m.然而,它看起来像spark.executor.memoryOverhead 选项被忽略(我也试过 spark.yarn.executor.memoryOverhead没有效果).因此,最小的 384m 被分配为开销.作为最低分配设置为 1024m,我们最终得到 (4096+3072)*1m + (4096+1024)*3m=22528m,其中第一项是驱动程序和第二项总结了执行程序内存.

  1. I don't quite understand the amount of memory allocated by yarn. Given the settings, I would assume yarn would reserve (4096+3072)*4m = 28672m. However, it looks like the spark.executor.memoryOverhead option is ignored (I also tried spark.yarn.executor.memoryOverhead with no effect). Therefore, the minimum of 384m is allocated as overhead. As the minimum allocation is set to 1024m, we end up with (4096+3072)*1m + (4096+1024)*3m=22528m, where the first term is the driver and the second term sums up the executor memory.

为什么只分配了 4 个 CPU VCore,即使我请求了更多并且最小分配是设置为 2 并且我请求了更多内核?在查看 Application Master 时,我发现以下执行程序:

Why are only 4 CPU VCores allocated, even though I requested more and minimum allocation is set to 2 and I requested more cores? When looking the Application Master, I find the following executors:

这里,每个执行器确实有 3 个内核.我怎么知道哪个值是正确的或什么我失踪了吗?

Here, the executors indeed have 3 cores each. How do I know which value is the correct one or what am I missing?

  1. 我尝试了一些设置,在 yarn-client 模式下,我应该使用诸如spark.yarn.am.memory 或 spark.yarn.am.cores.然而,这些似乎被纱线忽略了.为什么会这样?此外,在纱线客户端模式下,驱动程序应该在外部运行的纱线.为什么资源还是分配在yarn里面?我的 Zeppelin 在同一台机器上运行作为工人之一.

推荐答案

一个 Spark 应用程序具有三个角色:驱动程序、应用程序主控程序和执行程序.

One spark application has three roles: driver, application-master, and executor.

  1. 在客户端模式(部署模式之一)下,驱动程序本身不向纱线索取资源,因此我们有一个应用程序主,三个执行器,资源必须由 YARN 分配.所以我认为 spark 会要求 (4G + 3G) * 3 为三个执行者,1G 为 am.所以 Allocated Memory 将减少 22GB(22528MB).

  1. In client mode(one of deploy mode), driver itself do not ask resource from yarn, so we have one application-master, three executors which resource must be allocated by YARN. So I think spark will ask for (4G + 3G) * 3 for three executors, and 1G for am. So Allocated Memory will by 22GB(22528MB).

至于核心数,我认为 Spark UI 根据我的经验给出了正确的答案.

As for core number, I think Spark UI give the correct answer because my experience.

这篇关于Spark 和 Yarn 的资源分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆