如何调整Spark执行程序编号,内核和执行程序内存? [英] How to tune spark executor number, cores and executor memory?

查看:79
本文介绍了如何调整Spark执行程序编号,内核和执行程序内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您从哪里开始调整上述参数.我们是从执行程序的内存开始并获得执行程序的数量,还是从内核开始并获得执行程序编号.我遵循了链接.但是,有了一个高水平的想法,但仍然不确定如何或在何处开始并得出最终结论.

解决方案

以下答案涵盖了标题中提到的3个主要方面-执行程序数,执行程序内存和内核数.可能还有其他参数,例如驱动程序内存和我在此回答时尚未解决的其他参数,但希望在不久的将来添加.

案例1硬件-6个节点,每个节点16个内核,64 GB RAM

每个执行程序都是一个JVM实例.这样我们可以在单个Node中拥有多个执行器

操作系统和Hadoop守护程序需要前1个核心和1 GB,因此可用的是15个核心,每个节点63 GB RAM

从如何选择核数开始:

Number of cores = Concurrent tasks as executor can run 

So we might think, more concurrent tasks for each executor will give better performance. But research shows that
any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same
even if you have double(32) cores in the CPU.

执行人数量:

Coming back to next step, with 5 as cores per executor, and 15 as total available cores in one Node(CPU) - we come to 
3 executors per node.

So with 6 nodes, and 3 executors per node - we get 18 executors. Out of 18 we need 1 executor (java process) for AM in YARN we get 17 executors

This 17 is the number we give to spark using --num-executors while running from spark-submit shell command

每个执行者的内存:

From above step, we have 3 executors  per node. And available RAM is 63 GB

So memory for each executor is 63/3 = 21GB. 

However small overhead memory is also needed to determine the full memory request to YARN for each executor.
Formula for that over head is max(384, .07 * spark.executor.memory)

Calculating that overhead - .07 * 21 (Here 21 is calculated as above 63/3)
                            = 1.47

Since 1.47 GB > 384 MB, the over head is 1.47.
Take the above from each 21 above => 21 - 1.47 ~ 19 GB

So executor memory - 19 GB

最终编号-执行器-17,核心5,执行器内存-19 GB


案例2硬件:相同的6节点,32核,64 GB

5表示高并发性

每个节点的执行者数量= 32/5〜6

因此,执行者总数= 6 * 6个节点=36.那么对于AM = 35,最终数目是36-1

执行器内存为:每个节点6个执行器. 63/6〜10.头顶大小为.07 * 10 = 700 MB.因此,将总开销舍入到1GB,我们得到10-1 = 9 GB

最终编号-执行器-35,核心5,执行器内存-9 GB


案例3

以上方案从接受固定的内核数量开始,然后转移到执行程序和内存的数量.

现在,对于第一种情况,如果我们认为我们不需要19 GB,而仅10 GB就足够了,那么以下是数字:

核心5 每个节点的执行者数量= 3

在此阶段,根据我们的第一个计算,得出21,然后得出19.但是由于我们认为10没问题(假设开销很小),因此我们无法切换执行程序的数量 每个节点为6(例如63/10).如果每个节点有6个执行程序和5个核心,那么当我们只有16个核心时,它就降为每个节点30个核心.因此,我们还需要更改 每个执行者的核心.

所以再次计算,

魔术数字5等于3(小于或等于5的任何数字).因此,具有3个核心和15个可用核心-每个节点可获得5个执行程序.所以(5 * 6 -1)= 29个执行者

因此内存为63/5〜12.开销为12 * .07 = .84 因此执行程序的内存为12-1 GB = 11 GB

最终编号为29位执行者,3个内核,执行者内存为11 GB


动态分配:

注意:如果启用了动态分配,则执行者数量的上限.因此,这表明spark应用程序可以在需要时吞噬所有资源.所以在 集群中您正在运行其他应用程序,并且它们还需要核心来运行任务,请确保您在集群级别执行此操作.我的意思是你可以分配 基于用户访问权的特定数量的YARN内核.因此,您可以创建spark_user可能是,然后为该用户提供核心(最小/最大).这些限制是用于spark和在YARN上运行的其他应用程序之间共享的.

spark.dynamicAllocation.enabled-设置为true时-我们不需要提及执行程序.原因如下:

我们在spark-submit中给出的静态参数编号是整个工作期间的.但是,如果考虑到动态分配,则会有不同的阶段,例如

以什么开头:

以...开头的执行者的初始数量( spark.dynamicAllocation.initialExecutors )

多少:

然后根据负载(待处理任务)请求多少.这最终将是我们以静态方式提交火花时提供的数字.因此,一旦设置了初始执行程序编号,我们便转到最小( spark.dynamicAllocation.minExecutors )和最大( spark.dynamicAllocation.maxExecutors )编号.

何时要求或给予意见:

我们什么时候要求新的执行者( spark.dynamicAllocation.schedulerBacklogTimeout )-在这么长的时间内有待处理的任务.所以要求.每轮请求的执行者数量与上一轮相比呈指数增长.例如,应用程序将在第一轮中添加1个执行器,然后在随后的轮中添加2、4、8,依此类推.在特定的时刻,上面的最大值就变成了图片

什么时候我们放弃执行者( spark.dynamicAllocation.executorIdleTimeout )-

如果我错过任何事情,请纠正我.以上是我根据有问题的博客和一些在线资源得出的理解.谢谢.

参考:

Where do you start to tune the above mentioned params. Do we start with executor memory and get number of executors, or we start with cores and get the executor number. I followed the link. However got a high level idea, but still not sure how or where to start and arrive to a final conclusion.

解决方案

The following answer covers the 3 main aspects mentioned in title - number of executors, executor memory and number of cores. There may be other parameters like driver memory and others which I did not address as of this answer, but would like to add in near future.

Case 1 Hardware - 6 Nodes, and Each node 16 cores, 64 GB RAM

Each executor is a JVM instance. So we can have multiple executors in a single Node

First 1 core and 1 GB is needed for OS and Hadoop Daemons, so available are 15 cores, 63 GB RAM for each node

Start with how to choose number of cores:

Number of cores = Concurrent tasks as executor can run 

So we might think, more concurrent tasks for each executor will give better performance. But research shows that
any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same
even if you have double(32) cores in the CPU.

Number of executors:

Coming back to next step, with 5 as cores per executor, and 15 as total available cores in one Node(CPU) - we come to 
3 executors per node.

So with 6 nodes, and 3 executors per node - we get 18 executors. Out of 18 we need 1 executor (java process) for AM in YARN we get 17 executors

This 17 is the number we give to spark using --num-executors while running from spark-submit shell command

Memory for each executor:

From above step, we have 3 executors  per node. And available RAM is 63 GB

So memory for each executor is 63/3 = 21GB. 

However small overhead memory is also needed to determine the full memory request to YARN for each executor.
Formula for that over head is max(384, .07 * spark.executor.memory)

Calculating that overhead - .07 * 21 (Here 21 is calculated as above 63/3)
                            = 1.47

Since 1.47 GB > 384 MB, the over head is 1.47.
Take the above from each 21 above => 21 - 1.47 ~ 19 GB

So executor memory - 19 GB

Final numbers - Executors - 17, Cores 5, Executor Memory - 19 GB


Case 2 Hardware : Same 6 Node, 32 Cores, 64 GB

5 is same for good concurrency

Number of executors for each node = 32/5 ~ 6

So total executors = 6 * 6 Nodes = 36. Then final number is 36 - 1 for AM = 35

Executor memory is : 6 executors for each node. 63/6 ~ 10 . Over head is .07 * 10 = 700 MB. So rounding to 1GB as over head, we get 10-1 = 9 GB

Final numbers - Executors - 35, Cores 5, Executor Memory - 9 GB


Case 3

The above scenarios start with accepting number of cores as fixed and moving to # of executors and memory.

Now for first case, if we think we dont need 19 GB, and just 10 GB is sufficient, then following are the numbers:

cores 5 # of executors for each node = 3

At this stage, this would lead to 21, and then 19 as per our first calculation. But since we thought 10 is ok (assume little overhead), then we cant switch # of executors per node to 6 (like 63/10). Becase with 6 executors per node and 5 cores it comes down to 30 cores per node, when we only have 16 cores. So we also need to change number of cores for each executor.

So calculating again,

The magic number 5 comes to 3 (any number less than or equal to 5). So with 3 cores, and 15 available cores - we get 5 executors per node. So (5*6 -1) = 29 executors

So memory is 63/5 ~ 12. Over head is 12*.07=.84 So executor memory is 12 - 1 GB = 11 GB

Final Numbers are 29 executors, 3 cores, executor memory is 11 GB


Dynamic Allocation:

Note : Upper bound for the number of executors if dynamic allocation is enabled. So this says that spark application can eat away all the resources if needed. So in a cluster where you have other applications are running and they also need cores to run the tasks, please make sure you do it at cluster level. I mean you can allocate specific number of cores for YARN based on user access. So you can create spark_user may be and then give cores (min/max) for that user. These limits are for sharing between spark and other applications which run on YARN.

spark.dynamicAllocation.enabled - When this is set to true - We need not mention executors. The reason is below:

The static params number we give at spark-submit is for the entire job duration. However if dynamic allocation comes into picture, there would be different stages like

What to start with :

Initial number of executors (spark.dynamicAllocation.initialExecutors) to start with

How many :

Then based on load (tasks pending) how many to request. This would eventually be the numbers what we give at spark-submit in static way. So once the initial executor numbers are set, we go to min (spark.dynamicAllocation.minExecutors) and max (spark.dynamicAllocation.maxExecutors) numbers.

When to ask or give:

When do we request new executors (spark.dynamicAllocation.schedulerBacklogTimeout) - There have been pending tasks for this much duration. so request. number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds. At a specific point, the above max comes into picture

when do we give away an executor (spark.dynamicAllocation.executorIdleTimeout) -

Please correct me if I missed anything. The above is my understanding based on the blog i shared in question and some online resources. Thank you.

References:

这篇关于如何调整Spark执行程序编号,内核和执行程序内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆