使用Spark独立集群如何在工作节点上管理多个执行者? [英] How multiple executors are managed on the worker nodes with a Spark standalone cluster?

查看:171
本文介绍了使用Spark独立集群如何在工作节点上管理多个执行者?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

直到现在,我只在YARN作为资源管理器的Hadoop集群上使用了Spark.在这种类型的群集中,我确切地知道要运行多少个执行程序以及资源管理的工作方式.但是,知道我正在尝试使用独立Spark集群,我有些困惑.在错误的地方纠正我.

Until now, I have only used Spark on a Hadoop cluster with YARN as the resource manager. In that type of cluster, I know exactly how many executors to run and how the resource management works. However, know that I am trying to use a Standalone Spark Cluster, I have got a little bit confused. Correct me where I am wrong.

本文中,默认情况下,工作节点使用节点的所有内存减去1 GB.但是我知道通过使用SPARK_WORKER_MEMORY,我们可以使用更少的内存.例如,如果节点的总内存为32 GB,但我指定了16 GB,那么Spark worker是否在该节点上使用的内存不会超过16 GB?

From this article, by default, a worker node uses all the memory of the node minus 1 GB. But I understand that by using SPARK_WORKER_MEMORY, we can use lesser memory. For example, if the total memory of the node is 32 GB, but I specify 16 GB, Spark worker is not going to use anymore than 16 GB on that node?

但是执行者呢?让我们说如果我要在每个节点上运行2个执行程序,可以通过在spark-submit期间将执行程序内存指定为SPARK_WORKER_MEMORY的一半来做到这一点,如果我想在每个节点上运行4个执行程序,可以通过指定执行程序内存为SPARK_WORKER_MEMORY的四分之一?

But what about executors? Let us say if I want to run 2 executors per node, can I do that by specifying executor memory during spark-submit to be half of SPARK_WORKER_MEMORY, and if I want to run 4 executors per node, by specifying executor memory to be the quarter of SPARK_WORKER_MEMORY?

如果是这样的话,我认为除了执行者内存外,我还必须正确指定执行者核心.例如,如果我要在一个工人上运行4个执行程序,则必须将执行程序核心指定为SPARK_WORKER_CORES的四分之一?如果我指定一个更大的数字,会发生什么?我的意思是,如果我将执行程序内存指定为SPARK_WORKER_MEMORY的四分之一,但是执行程序核心仅是SPARK_WORKER_CORES的一半?在这种情况下,我可以让2或4个执行程序在该节点上运行吗?

If so, besides executor memory, I would also have to specify executor cores correctly, I think. For example, if I want to run 4 executors on a worker, I would have to specify executor cores to be the quarter of SPARK_WORKER_CORES? What happens, if I specify a bigger number than that? I mean if I specify executor memory to be the quarter of SPARK_WORKER_MEMORY, but executor cores to be only half of SPARK_WORKER_CORES? Would I get 2 or 4 executors running on that node in that case?

推荐答案

因此,我自己亲自测试了Spark Standalone集群,这是我注意到的.

So, I experimented with the Spark Standalone cluster myself a bit, and this is what I noticed.

  1. 我的直觉是,通过调整执行器内核,可以在一个工人内部运行多个执行器,这的确是正确的.让我们说,您的工人有16个核心.现在,如果您为执行程序指定8个内核,Spark将为每个工作人员运行2个执行程序.

  1. My intuition that muliple executors can be run inside a worker, by tuning executor cores was indeed correct. Let us say, your worker has 16 cores. Now if you specify 8 cores for executors, Spark would run 2 executors per worker.

在一个工作程序中运行多少个执行程序还取决于您指定的执行程序内存.例如,如果工作程序内存为24 GB,并且您要为每个工作程序运行2个执行程序,则不能将执行程序内存指定为大于12 GB.

How many executors run inside a worker also depend upon the executor memory you specify. For example, if worker memory is 24 GB, and you want to run 2 executors per worker, you cannot specify executor memory to be more than 12 GB.

启动从站时,可以通过指定可选参数--memory的值或更改SPARK_WORKER_MEMORY的值来限制工作人员的内存.与内核数(--cores/SPARK_WORKER_CORES)相同.

A worker's memory can be limited when starting a slave by specifing the value for optional parameter--memory or by changing the value of SPARK_WORKER_MEMORY. Same with the number of cores (--cores/SPARK_WORKER_CORES).

如果希望能够在Standalone Spark集群上运行多个作业,则可以在执行spark-submit时使用spark.cores.max配置属性.例如,这样.

If you want to be able to run multiple jobs on the Standalone Spark cluster, you could use the spark.cores.max configuration property while doing spark-submit. For example, like this.

spark-submit <other parameters> --conf="spark.cores.max=16" <other parameters>

因此,如果您的独立Spark集群总共允许64个内核,而您的程序仅提供16个内核,则其他Spark作业可以使用剩余的48个内核.

So, if your Standalone Spark Cluster allows 64 cores in total, and you give only 16 cores to your program, other Spark jobs could use the remaining 48 cores.

这篇关于使用Spark独立集群如何在工作节点上管理多个执行者?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆