WARN cluster.YarnScheduler:初始作业未接受任何资源 [英] WARN cluster.YarnScheduler: Initial job has not accepted any resources

查看:1405
本文介绍了WARN cluster.YarnScheduler:初始作业未接受任何资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行的任何Spark作业都将失败,并显示以下错误消息

Any spark jobs that I run will fail with the following error message

17/06/16 11:10:43 WARN cluster.YarnScheduler:初始作业尚未 接受任何资源;检查您的集群用户界面,以确保工作人员 已注册并具有足够的资源

17/06/16 11:10:43 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Spark版本是1.6,在Yarn上运行.

Spark version is 1.6, running on Yarn.

我正在从pyspark发布职位.

I am issuing jobs from pyspark.

您可以从作业时间轴中注意到它是无限期运行的,并且没有添加或删除资源. 1

And you can notice from the job timeline that it runs indefinitely and no resources are added or removed.1

推荐答案

首先要指出的是,如果有足够的资源(例如节点,CPU和内存)供yarn使用,则可以使用动态分配来创建具有适当默认核心的Spark作业程序,并且分配的内存.

First point is that if there are enough resources such as nodes, CPUs and memory available to yarn it can use dynamic allocation to create spark workers with appropriate default cores and memory allocated.

在我的情况下,我需要关闭动态分配,因为我的资源水平非常低.

In my case I needed to turn off dynamic allocation as my resource levels were very low.

因此从pyspark我设置以下值:

So from pyspark I set the following values :

conf = (SparkConf().setAppName("simple")
        .set("spark.shuffle.service.enabled", "false")
        .set("spark.dynamicAllocation.enabled", "false")
        .set("spark.cores.max", "1")
        .set("spark.executor.instances","2")
        .set("spark.executor.memory","200m")
        .set("spark.executor.cores","1")

注意:基本上,此处设置的值应小于实际可用资源.但是,此处的值太小会导致内存不足问题或作业运行时性能下降的问题.

Note: basically the values set here should be less than the actual resources available. However too small values here can lead to out of memory issues, or slow performance issues when your job runs.

此处提供了示例作业的完整代码摘要

此pyspark案例要注意的另一个重要点是,纱线上的Spark可以在两种模式下运行

Another important point to note for this pyspark case is that Spark on Yarn can run on two modes

  1. 集群模式-Spark驱动程序在spark主节点中运行
  2. 客户端模式-Spark驱动程序从运行交互式Shell的客户端运行.

集群模式不太适合以交互方式使用Spark.需要用户输入的Spark应用程序(例如spark-shell和pyspark)要求Spark驱动程序在启动Spark应用程序的客户端进程中运行.

Cluster mode is not well suited to using Spark interactively. Spark applications that require user input, such as spark-shell and pyspark, require the Spark driver to run inside the client process that initiates the Spark application.

可以在以下环境中设置客户端模式
export PYSPARK_SUBMIT_ARGS='--master yarn --deploy-mode client pyspark-shell'

Client mode can be set in environment as below
export PYSPARK_SUBMIT_ARGS='--master yarn --deploy-mode client pyspark-shell'

这篇关于WARN cluster.YarnScheduler:初始作业未接受任何资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆