执行者和核心人数 [英] Number of executors and cores

查看:120
本文介绍了执行者和核心人数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新手,如果我们有2个从属c4.8xlarge节点和1个c4.8x大型主节点,我想知道在Spark作业和AWS中必须使用多少个内核和执行程序.我尝试了不同的组合,但无法理解该概念.

I am new to spark and would like to know how many cores and executors have to be used in a spark job and AWS if we have 2 slave c4.8xlarge nodes and 1 c4.8x large master node. I have tried different combinations but not able to understand the concept.

谢谢.

推荐答案

Cloudera的家伙对此给出了很好的解释

Cloudera guys gave good explanation on that

https://www.youtube.com/watch?v=vfiJQ7wg81Y

如果,假设您的节点上有16个内核(我认为这正是您的情况),那么您给1的yarn来管理该节点,那么您将15分配给3,那么每个执行器有5个内核. 另外,您的Java开销为Max(384M,0.07 * spark.executor.memory). 因此,如果每个节点有3个执行程序,那么JVM就有3 * Max(384M,0.07 * spark.executor.memory)开销,其余的可以用于内存容器.

If, let's say you have 16 cores on your node(I think this is exactly your case), then you give 1 for yarn to manage this node, then you devide 15 to 3, so each executor has 5 cores. Also, you have java overhead which is Max(384M, 0.07*spark.executor.memory). So, if you have 3 executors per node, then you have 3*Max(384M, 0.07*spark.executor.memory) overhead for JVMs, the rest can be used for memory containers.

但是,在有许多用户同时工作的集群上,yarn可以将您的spark会话从某些容器中推出,使spark一直通过DAG一直返回,并使所有RDD都返回到当前状态,这很糟糕.这就是为什么您需要减少--num-executors,-executor-memory和--executor-cores的数量,以便提前为其他用户提供一些空间的原因.但这不适用于您是唯一一位用户的AWS.

However, on a cluster with many users working simultaneously, yarn can push your spark session out of some containers, making spark go all the way back through the DAG and bringing all the RDD to the present state, which is bad. That is why you need to make --num-executors, --executor-memory and --executor-cores slightly less to give some space to other users in advance. But this doesn't apply to AWS where you are the only one user.

-执行者内存18Gb应该可以为您工作

--executor-memory 18Gb should work for you btw

有关设置群集参数的更多详细信息 http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

More details on turning your cluster parameters http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

这篇关于执行者和核心人数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆