Spark 作业已提交 - 正在等待(TaskSchedulerImpl:不接受初始作业) [英] Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted)

查看:35
本文介绍了Spark 作业已提交 - 正在等待(TaskSchedulerImpl:不接受初始作业)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为提交作业而进行的 API 调用.响应状态 - 它正在运行

API call made to submit the Job. Response states - It is Running

在集群用户界面上 -

On Cluster UI -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 还活着

Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive

使用 2 个内核中的 1 个

Core 1 out of 2 used

使用 6 个内存 1Gb

Memory 1Gb out of 6 used

运行应用

app-20160713130056-0020 - 等待 5 小时

app-20160713130056-0020 - Waiting since 5hrs

核心 - 无限

申请职位描述

活跃阶段

reduceByKey 位于/root/wordcount.py:23

reduceByKey at /root/wordcount.py:23

待定阶段

takeOrdered at/root/wordcount.py:26

takeOrdered at /root/wordcount.py:26

运行驱动程序 -

stderr log page for driver-20160713130051-0025 

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

根据 初始作业没有接受任何资源;检查您的集群 UI 以确保工作人员已注册并拥有足够的资源Slaves 尚未启动 - 因此它没有资源.

According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.

但是在我的情况下 - Slave 1 正在工作

However in my case - Slave 1 is working

根据 无法执行多个火花作业初始作业尚未接受任何资源"我使用的是部署模式 = 集群(不是客户端),因为我有 1 个主从 1 个从,并且通过 Postman/任何地方调用 Submit API

According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere

集群还有可用的内核、RAM、内存 - 仍然作业抛出错误由 UI 传达

Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI

根据TaskSchedulerImpl:初始作业未接受任何资源;我分配了

~/spark-1.5.0/conf/spark-env.sh

Spark 环境变量

SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2

复制那些跨奴隶

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

上述问题答案中的所有案例 - 都适用,但仍未找到解决方案.因此,因为我正在使用 API 和 Apache SPark - 也许需要其他一些帮助.

All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.

2016 年 7 月 18 日编辑

Edited July 18,2016

Wordcount.py - 我的 PySpark 应用程序代码 -

Wordcount.py - My PySpark application code -

from pyspark import SparkContext, SparkConf

logFile = "/user/root/In/a.txt"

conf = (SparkConf().set("num-executors", "1"))

sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")

错误

Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

根据 Spark UI 显示 0 个内核即使在应用程序中设置内核

Spark WebUI 声明使用了零个内核并且无限期地等待没有任务运行.该应用程序在运行时或内核期间也没有使用任何内存,并在启动时立即进入等待状态

Spark 1.6.1 版Ubuntu亚马逊 EC2

Spark version 1.6.1 Ubuntu Amazon EC2

推荐答案

我也有同样的问题.以下是我在发生时的评论.

I also have the same issue. Below are my remarks when it occurs.

1:17:46 WARN TaskSchedulerImpl:初始作业没有接受任何资源;检查您的集群 UI 以确保工作人员已注册并且有足够的资源

1:17:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我注意到它只发生在从 scala shell 的第一次查询期间,在那里我运行了一些从 hdfs 获取数据的东西.

I noticed that it only occurs during the first query from scala shell where I run something fetching data from hdfs.

出现问题时,webui 指出没有任何正在运行的应用程序.

When the problem occurs, the webui states that there's not any running applications.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed 
Status: ALIVE

似乎有些东西无法启动,我无法确切地说出是什么.

It seems that something fails to start , I can't tell exactly which it is.

但是,第二次重新启动集群会将 Applications 值设置为 1并且一切正常.

However restarting the cluster a second time sets the Applications value to 1 and everything works well.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

我仍在调查中,这种快速解决方法可以节省最终解决方案的时间.

I'm still investigate, this quick workaround can save times till final solution.

这篇关于Spark 作业已提交 - 正在等待(TaskSchedulerImpl:不接受初始作业)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆