了解Mesos上的Spark作业的资源分配 [英] Understanding resource allocation for spark jobs on mesos

查看:75
本文介绍了了解Mesos上的Spark作业的资源分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Spark中处理一个项目,最近从使用Spark Standalone切换到Mesos进行集群管理.现在,我对在新系统下提交作业时如何分配资源感到困惑.

I'm working on a project in Spark, and recently switched from using Spark Standalone to Mesos for cluster management. I now find myself confused about how to allocate resources when submitting a job under the new system.

在独立模式下,我正在使用类似的东西(遵循

In standalone mode, I was using something like this (following some recommendations from this Cloudera blog post:

/opt/spark/bin/spark-submit --executor-memory 16G --executor-cores 8 
    --total-executor-cores 240 myscript.py

这是在群集上,其中每台计算机都有16个内核和〜32 GB RAM.

This is on a cluster where each machine has 16 cores and ~32 GB RAM.

对此的好处是,我可以很好地控制正在运行的执行程序的数量以及分配给每个执行程序的资源.在上面的示例中,我知道我得到了240/8 = 30个执行器,每个执行器具有16GB内存和8个内核.给定群集中每台计算机上的内存,每台计算机上运行的执行程序最多不超过两个.如果我想要更多的执行者,我可以做类似

What was nice about this is that I had nice control over the number of executors running and the resources allocated to each. In the example above, I knew I was getting 240/8=30 executors, each with 16GB of memory and 8 cores. Given the memory on each machine in the cluster, this would amount to no more than two executors running on each machine. If I wanted more executors, I could do something like

/opt/spark/bin/spark-submit --executor-memory 10G --executor-cores 5 
    --total-executor-cores 240 myscript.py

这现在将为我提供240/5 = 47个执行器,每个执行器具有5个内核和10GB内存,并且每台计算机最多允许3个执行器.

This would now give me 240/5=47 executors, each with 5 cores and 10GB memory, and would allow up to 3 executors per machine.

但是现在我陷入了困境,我有点困惑.首先,我正在粗粒度模式下运行,以确保可以修复和控制资源分配(这是在我们要预分配资源的相当复杂的模型中提供的服务).

But now that I'm on mesos, I'm getting a bit confused. First off, I'm running in coarse-grained mode to ensure I can fix and control my resource allocation (this is in the service of fairly complex model where we want to pre-allocate resources).

现在,我可以指定--total-executor-cores--executor-memory,但是文档告诉我--exeuctor-cores仅适用于Spark独立服务器和YARN,这使得指定执行者的总数和分配给每个执行者的资源很困难.说我运行这个:

Now, I can specify --total-executor-cores and --executor-memory, but the documentation tells me that --exeuctor-cores applies to Spark standalone and YARN only, which makes specifying the total number of executors and resources allocated to each difficult. Say I run this:

/opt/spark/bin/spark-submit --total-executor-cores 240 --executor-memory 16G --conf spark.mesos.coarse=true myscript.py

当我在Mesos Web UI中检查这项工作时,事情开始变得混乱.所以,这是我的问题:

When I examine this job in the Mesos web UI, things start getting confusing. So, here are my questions:

  1. 术语. Web UI列出了框架",我认为它们对应于独立UI中的作业".但是,当我单击给定框架的详细信息时,它将列出任务".但是这些不是实际的Spark任务,对吧?据我所知,就Spark而言,这里的任务"实际上必须是执行者".这与UI表示我的框架(作业)具有以下内容是一致的:15个活动任务,240个CPU和264GB内存.

  1. Terminology. The Web UI lists "frameworks", which I assume correspond to "jobs" in the standalone UI. But when I click on the detail for a given framework, it lists "tasks". But these can't be actual Spark tasks, right? As far as I can tell, "task" here must actually mean "executor" as far as Spark is concerned. This would be consistent with the UI saying my framework (job) has: 15 active tasks, 240 CPUs, and 264GB memory.

264/15 = 17.6,这似乎与我指定的每个执行器16GB内存一致(我认为还有一些开销).我对这一切的解释正确吗?

264/15=17.6, which seems consistent with the 16GB memory per executor I specified (plus some overhead, I guess). Am I right on how I'm interpreting all this?

假设是,当我检查这些任务"(执行程序)中的任何一个时,我看到每个任务都分配了16个内核.假设每台计算机有16个内核,这似乎表明我基本上在16台计算机上每台都运行一个执行程序,并且每个执行程序都获得了完整的16个内核,但只有16 GB的RAM. (请注意,即使我将--executor-memory降到了4GB之类,mesos仍然每个节点只运行一个执行程序,具有16个内核和4GB RAM).但是我要完成的工作类似于前两个示例.也就是说,我想在每个节点上运行多个执行程序,每个执行程序共享该节点的RAM和内核(即适度的内核数,每个执行程序5-8个).考虑到我无法在Mesos中指定--executor-cores,我该如何完成?还是出于某种原因我想脱离基地吗? Mesos会不允许每个节点使用多个执行器吗?

Assuming yes, when I examine any of these "tasks" (executors) I see that each has 16 cores assigned. Given we have 16 cores per machine, this would seem to indicate I'm basically running one executor on each of 16 machines, and that each executor is getting the full 16 cores, but only 16 GB of RAM. (note that, even if I drop --executor-memory way down, to something like 4GB, mesos still just runs one executor per node, with 16 cores and 4GB RAM). But what I want to accomplish is something like my first two examples. That is, I want to run multiple executors per node, each sharing the RAM and cores of that node (i.e.a moderate number of cores pre executor, 5-8). Considering I can't specify --executor-cores in Mesos, how do I accomplish this? Or am I way off base for some reason in even wanting to accomplish this? Will Mesos just not permit multiple exeuctors per node?

推荐答案

问题1: 在粗粒度模式下,Spark的执行程序(org.apache.spark.executor.CoarseGrainedExecutorBackend)作为Mesos任务启动. Mesos框架实际上是Spark驱动程序.一个Spark驱动程序可以提交多个Spark作业.这取决于您的Spark应用程序. Spark和Mesos都来自UC Berkeley的AMPLab,并且是并行开发的,因此它们使用相似的术语(执行程序,任务...),这可能会使您感到困惑:-).

Question 1: In coarse-grained mode, Spark's executor (org.apache.spark.executor.CoarseGrainedExecutorBackend) is launched as Mesos task. Mesos Framework actually is Spark Driver. One Spark Driver could submit multiple Spark jobs. It depends on your Spark application. Spark and Mesos both come from AMPLab of UC Berkeley and are developed in parallel, so they use similar terminologies (executor, task ...) which may confuse you :-).

问题2: 在粗粒度模式下,Spark每个主机仅启动一个执行程序(请参考 https://有关详细信息,请参阅Issues.apache.org/jira/browse/SPARK-5095 .因此,对于您来说,Spark将为每个主机启动一个执行程序(每个执行程序消耗16G内存,并且主机中所有可用的核心(如果没有其他工作负载,则为16核心)),直到执行程序的核心总数达到240个核心为止.将有240/16 = 15个执行者.

Question 2: In coarse-grained mode, Spark launch only one executor per host (please refer to https://issues.apache.org/jira/browse/SPARK-5095 for details). So for you case, Spark will launch one executor per host (each executor consume 16G memory, and all the available cores in the host which is 16 cores if there is no other workload) until total cores of executors reach 240 cores. There will be 240/16=15 executors.

关于spark.mesos.mesosExecutor.cores,它仅适用于细粒度模式.在细粒度模式下,Spark将为每个主机启动一个执行程序(org.apache.spark.executor.MesosExecutorBackend).即使没有任务,执行程序也会消耗spark.mesos.mesosExecutor.cores的内核数.每个任务将消耗另一个数量的spark.task.cpus内核.

Regard to spark.mesos.mesosExecutor.cores, it only works for fine-grained mode. In fine-grained mode, Spark will launch one executor (org.apache.spark.executor.MesosExecutorBackend) per host. The executor consumes the number of cores of spark.mesos.mesosExecutor.cores even though there is no task. Each task will consume another number of cores of spark.task.cpus.

这篇关于了解Mesos上的Spark作业的资源分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆