如何解决电火花容器上浆的问题? [英] How to solve yarn container sizing issue on spark?

查看:59
本文介绍了如何解决电火花容器上浆的问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在YARN上启动一些pyspark作业.我有2个节点,每个节点有10 GB.我可以像这样打开pyspark shell:pyspark

I want to launch some pyspark jobs on YARN. I have 2 nodes, with 10 GB each. I am able to open up the pyspark shell like so: pyspark

现在,当我有一个非常简单的示例尝试启动时:

Now when I have a very simple example that I try to launch:

import random
NUM_SAMPLES=1000
def inside(p):
    x, y = random.random(), random.random()
    return x*x + y*y < 1

count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
             .filter(inside).count()
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

结果是我得到了很长的火花记录,并显示了错误输出.最重要的信息是:

I get as a result a very long spark log with the error output. The most important information is:

ERROR cluster.YarnScheduler: Lost executor 1 on (ip>: Container marked as failed: <containerID> on host: <ip>. Exit status 1.  Diagnostics: Exception from container-launch.  ......

后来在我看到的日志中...

later on in the logs I see...

ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 1 times: aborting job
INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 1
INFO spark.ExecutorAllocationManager: Existing executor 1 has been removed (new total is 0)

从我从上面的日志中收集的信息来看,这似乎是纱线中的一个容器上浆问题.

From what I'm gathering from the logs above, this seems to be a container sizing issue in yarn.

我的yarn-site.xml文件具有以下设置:

My yarn-site.xml file has the following settings:

yarn.scheduler.maximum-allocation-mb = 10240
yarn.nodemanager.resource.memory-mb = 10240

并且在spark-defaults.conf中包含:

spark.yarn.executor.memoryOverhead=2048
spark.driver.memory=3g

如果您想了解其他设置,请告诉我.

If there are any other settings you'd like to know about, please let me know.

如何适当设置纱线中的容器尺寸?
(对于可以帮助我解决此问题的人,将获得悬赏)

How do I set the container size in yarn appropriately?
(bounty on the way for someone who can help me with this)

推荐答案

首先让我解释在YARN群集上调整spark应用程序所需的基本属性集.

Let me first explain the basic set of properties required to tune your spark application on a YARN cluster.

注意:YARN中的容器等效于Spark中的Executor.为了便于理解,您可以认为两者是相同的.

Note: Container in YARN is equivalent to Executor in Spark. For understandability, you can consider that both are same.

在yarn-site.xml上:

yarn.nodemanager.resource.memory-mb是集群从给定节点可用的总内存.

yarn.nodemanager.resource.memory-mb is the total memory available to the cluster from a given node.

yarn.nodemanager.resource.cpu-vcores是从给定节点可用于群集的CPU vcore的总数.

yarn.nodemanager.resource.cpu-vcores is the total number of CPU vcores available to the cluster from a given node.

yarn.scheduler.maximum-allocation-mb是每个纱线容器可以分配的最大内存(以mb为单位).

yarn.scheduler.maximum-allocation-mb is the maximum memory in mb that can be allocated per yarn container.

yarn.scheduler.maximum-allocation-vcores是每个纱线容器可以分配的最大芯线数量.

yarn.scheduler.maximum-allocation-vcores is the maximum number of vcores that can be allocated per yarn container.

示例:如果节点具有16GB和8vcore,并且您想为集群贡献14GB和6vcore(用于容器),则如下所示设置属性:

Example: If a node has 16GB and 8vcores and you would like to contribute 14GB and 6vcores to the cluster(for containers), then set properties as shown below:

yarn.nodemanager.resource.memory-mb:14336(14GB)

yarn.nodemanager.resource.memory-mb : 14336 (14GB)

yarn.nodemanager.resource.cpu-vcores:6

yarn.nodemanager.resource.cpu-vcores : 6

然后,要创建分别具有2GB和1vcore的容器,请设置以下属性:

And, to create containers with 2GB and 1vcore each, set these properties:

yarn.scheduler.maximum-allocation-mb:2049

yarn.scheduler.maximum-allocation-mb : 2049

yarn.scheduler.maximum-allocation-vcores:1

yarn.scheduler.maximum-allocation-vcores : 1

注意:即使有足够的内存(14gb)创建7个2GB的容器,上述配置也只会创建6个2GB的容器,而14GB中只有12GB会用于集群.这是因为集群只有6个vcore.

Note: Even though there is enough memory(14gb) to create 7 containers with 2GB, above config will only create 6 containers with 2GB and only 12GB out of 14GB will be utilized to the cluster. This is because there are only 6vcores available to the cluster.

现在在Spark端

以下属性指定每个执行者/容器要请求的内存

Below properties specify memory to be requested per executor/container

spark.driver.memory

spark.executor.memory

以下属性指定每个执行者/容器要请求的vcores

Below properties specify vcores to be requested per executor/container

spark.driver.cores

spark.executor.cores

IMP: 所有Spark的内存和vcore属性均应小于或等于YARN的配置

以下属性指定可用于YARN集群中的spark应用程序的执行程序/容器的总数.

Below property specifies the total number of executors/containers that can be used for your spark application from the YARN cluster.

spark.executor.instances

此属性应小于YARN群集中可用容器的总数.

This property should be less than the total number of containers available in the YARN cluster.

完成纱线配置后,火花应请求容器可以根据纱的配置进行分配.这意味着,如果YARN配置为每个容器最多分配2GB,并且Spark请求一个具有3GB内存的容器,则作业将停止或停止,因为YARN无法满足Spark的请求.

Once the yarn configuration is complete, the spark should request for containers that can be allocated based on the YARN configurations. That means if YARN is configured to allocate a maximum of 2GB per container and Spark requests a container with 3GB memory, then the job will either halt or stop because YARN cannot satisfy the spark's request.

现在为您的用例: 通常,群集调整基于工作负载.但是下面的配置应该更合适.

Now for your use case: Usually, cluster tuning is based on the workloads. But below config should be more suitable.

可用内存:10GB * 2个节点 可用的vcores :5 * 2个vcores [假设]

Memory available: 10GB * 2 nodes Vcores available: 5 * 2 vcores [Assumption]

在yarn-site.xml上 [在两个节点中]

On yarn-site.xml [In both the nodes]

yarn.nodemanager.resource.memory-mb:10240

yarn.nodemanager.resource.cpu-vcores:5

yarn.scheduler.maximum-allocation-mb:2049

yarn.scheduler.maximum-allocation-vcores:1

使用上述配置,您可以在每个节点上最多创建10个容器,每个节点具有2GB,每个容器1个vcore.

Using above config, you can create a maximum of 10 containers on each of the nodes having 2GB,1vcore per container.

火花配置

spark.driver.memory 1536mb

spark.yarn.executor.memoryOverhead 512mb

spark.executor.memory 1536mb

spark.yarn.executor.memoryOverhead 512mb

spark.driver.cores 1

spark.executor.cores 1

spark.executor.instances 19

请随时根据您的需要试用这些配置.

Please feel free to play around these configurations to suit your needs.

这篇关于如何解决电火花容器上浆的问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆