使用Spark和Yarn进行资源分配 [英] Resource Allocation with Spark and Yarn

查看:314
本文介绍了使用Spark和Yarn进行资源分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在纱客户端模式下使用Zeppelin 0.7.3和Spark 2.3. 我的设置是:

I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are:

火花:

spark.driver.memory             4096m
spark.driver.memoryOverhead         3072m
spark.executor.memory           4096m
spark.executor.memoryOverhead           3072m
spark.executor.cores                3
spark.executor.instances            3

纱线:

Minimum allocation: memory:1024, vCores:2
Maximum allocation: memory:9216, vCores:6
The application started by Zeppelin gets the following resources:
Running Containers      4
Allocated CPU VCores        4
Allocated Memory MB 22528

  1. 我不太了解yarn分配的内存量.给定设置, 我假设纱线会保留(4096 + 3072)* 4m = 28672m.但是,它看起来像 spark.executor.memoryOverhead选项将被忽略(我也尝试了spark.yarn.executor.memoryOverhead 无效).因此,最少384m被分配为开销.作为最小分配 设置为1024m,我们最终得到(4096 + 3072)* 1m +(4096 + 1024)* 3m = 22528m,其中第一个项是 驱动程序,第二项总结了执行者的记忆力.

  1. I don't quite understand the amount of memory allocated by yarn. Given the settings, I would assume yarn would reserve (4096+3072)*4m = 28672m. However, it looks like the spark.executor.memoryOverhead option is ignored (I also tried spark.yarn.executor.memoryOverhead with no effect). Therefore, the minimum of 384m is allocated as overhead. As the minimum allocation is set to 1024m, we end up with (4096+3072)*1m + (4096+1024)*3m=22528m, where the first term is the driver and the second term sums up the executor memory.

为什么只分配了4个CPU VCore,即使我请求了更多且最小分配是 设置为2,我请求更多的内核?当查看Application Master时,我发现以下执行程序:

Why are only 4 CPU VCores allocated, even though I requested more and minimum allocation is set to 2 and I requested more cores? When looking the Application Master, I find the following executors:

在这里,执行者实际上每个人都有3个核心.我怎么知道哪个值是正确的或什么 我想念吗?

Here, the executors indeed have 3 cores each. How do I know which value is the correct one or what am I missing?

  1. 我尝试了一些设置,在yarn-client模式下,我应该使用诸如 spark.yarn.am.memory或spark.yarn.am.cores.但是,纱线似乎忽略了那些. 为什么会这样呢?此外,在yarn-client模式下,驱动程序应该在外部运行 纱.为什么仍然在纱线内分配资源?我的齐柏林飞艇在同一台机器上运行 作为工人之一.
  1. I tried a couple of settings and in yarn-client mode I am supposed to use options such as spark.yarn.am.memory or spark.yarn.am.cores. However, it seems like those are ignored by yarn. Why is this the case? Additionally, in yarn-client mode, the driver is supposed to run outside of yarn. Why are the resources still allocated within yarn? My Zeppelin is running on the same machine as one of the workers.

推荐答案

一个spark应用程序具有三个角色:驱动程序,应用程序主程序和执行程序.

One spark application has three roles: driver, application-master, and executor.

  1. 在客户端模式(一种部署模式)下,驱动程序本身不会从yarn询问资源,因此我们有一个应用程序主程序,三个执行程序,哪些资源必须由YARN分配.因此,我认为spark将要求三位执行者(4G + 3G)* 3,而am则要求1G.因此Allocated Memory将减少22GB(22528MB).

  1. In client mode(one of deploy mode), driver itself do not ask resource from yarn, so we have one application-master, three executors which resource must be allocated by YARN. So I think spark will ask for (4G + 3G) * 3 for three executors, and 1G for am. So Allocated Memory will by 22GB(22528MB).

对于核心号码,我认为Spark UI给出了正确的答案,因为我的经验.

As for core number, I think Spark UI give the correct answer because my experience.

这篇关于使用Spark和Yarn进行资源分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆