已提交Spark作业-等待中(TaskSchedulerImpl:不接受初始作业) [英] Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted)

查看:107
本文介绍了已提交Spark作业-等待中(TaskSchedulerImpl:不接受初始作业)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已进行

API调用以提交作业。响应状态-它正在运行



在集群用户界面上-


Worker(从属)-worker-20160712083825-172.31.17.189-59433仍然有效



已使用2核中的1个



内存6个使用中的1 Gb


正在运行的应用程序


app-20160713130056-0020-等待5小时以来



核心-无限制


应用程序的职位描述



活动阶段


reduceByKey位于/ root /wordcount.py:23


待处理阶段


takeOrdered at /root/wordcount.py:26


运行驱动程序-



驱动程序20160713130051-0025

的stderr日志页WARN调度程序。TaskSchedulerImpl:初始作业未接受任何资源。检查您的集群用户界面,以确保工作人员已注册并有足够的资源

根据初始工作未接受任何资源;检查您的群集UI,以确保已注册工作人员并具有足够的资源
尚未启动从站-因此,它没有资源。



但是对于我来说-从站1正在工作



根据无法执行更多的火花作业初始作业未接受任何资源
我使用的是deploy-mode = cluster(不是客户端),因为我有1个master 1个slave,并且通过Postman /任何地方调用了Submit API



可用的内核,RAM,内存-静态作业会引发UI传达的错误



根据 TaskSchedulerImpl:初始作业未接受任何资源;
我分配了

 〜/ spark-1.5.0 / conf / spark-env.sh 

Spark环境变量

  SPARK_WORKER_INSTANCES = 1 
SPARK_WORKER_MEMORY = 1000m
SPARK_WORKER_CORES = 2

复制了所有从属服务器

  sudo / root / spark-ec2 / copy-dir /root/spark/conf/spark-env.sh 

上述问题的答案中的所有情况-均适用,但未找到解决方案。因此,因为我正在使用API​​和Apache SPark-也许需要其他帮助。


2016年7月18日编辑



Wordcount.py-我的PySpark应用程序代码-




 从pyspark导入SparkContext,SparkConf 

logFile = /user/root/In/a.txt

conf =(SparkConf()。set( num-executors, 1))

sc = SparkContext (master = spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077,appName = MyApp,conf = conf)
print( in here)
行= sc.textFile(logFile)
print(读取的文本)
c = lines.count()
print(行数)

错误

 开始工作:计数在/root/wordcount.py:11 
16/07/18 07:46:39 INFO scheduler.DAGcheduler:得到了作业0(在/root/wordcount.py:11计数)有2个输出分区
16/07/18 07:46:39 INFO scheduler.DAG调度程序:最后阶段:ResultStage 0(在/root/wordcount.py:11处计数)
16/07/18 07:46:39 INFO scheduler.DAGcheduler:最后阶段的父母:Lis t()
16/07/18 07:46:39 INFO scheduler.DAGScheduler:缺少父母:List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler:提交ResultStage 0 (PythonRDD [2]的计数为/root/wordcount.py:11),没有丢失的父项
16/07/18 07:46:39信息存储。MemoryStore:阻止广播_1作为值存储在内存中(估计大小5.6 KB,可用56.2 KB)
16/07/18 07:46:39 INFO存储器.MemoryStore:块广播_1_piece0作为字节存储在内存中(估计大小3.4 KB,可用59.7 KB)
16 / 07/18 07:46:39 INFO storage.BlockManagerInfo:在172.31.17.189:43684的内存中添加了broadcast_1_piece0(大小:3.4 KB,自由:511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext:从DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler的广播创建广播1,从ResultStage 0(PythonRDD [2]提交/计数为/ root / wordcount.py:11)
18/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl:使用2个任务
添加任务集0.0 18/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl:初始作业未接受任何资源;检查您的集群用户界面,以确保工作人员已注册并有足够的资源

根据即使在App中设置核心,Spark UI也会显示0个核心



Spark WebUI指出使用了零个内核,并且无限期地等待没有任务在运行。该应用程序在运行时或内核期间也不使用任何内存,并且在启动时立即达到等待状态



Spark版本1.6.1
Ubuntu
Amazon EC2

解决方案

我也遇到了同样的问题。


1:17:46 WARN TaskSchedulerImpl:初始作业未接受任何
资源;以下是我的评论。检查您的群集用户界面,以确保工作人员已注册
并具有足够的资源


我注意到它仅在第一个



发生问题时,webui指出没有任何正在运行的应用程序。

  URL:spark:// spark1:7077 
REST URL:spark:// spark1:6066(集群模式)
Alive Workers :4个
使用中的内核:总共26个,使用了26个
使用中的存储器:总计52.7 GB,使用了4.0 GB
应用程序:0个正在运行,0个已完成
驱动程序:0个正在运行, 0已完成
状态:ALIVE

似乎无法启动,我无法启动



但是,第二次重新启动群集会将Applications值设置为1
,并且一切正常。

  URL:spark:// spark1:7077 
REST URL:spark:// spark1:6066(集群模式)
活着的工人:4
使用的内核:26个总计,26个使用的
使用的内存:52.7 GB总计,4.0 GB使用的
应用程序:1个正在运行,0个已完成
驱动程序:0运行中,0已完成
状态:ALIVE

我仍在调查中,这种快速的解决方法可以节省时间直到最终解决方案。


API call made to submit the Job. Response states - It is Running

On Cluster UI -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive

Core 1 out of 2 used

Memory 1Gb out of 6 used

Running Application

app-20160713130056-0020 - Waiting since 5hrs

Cores - unlimited

Job Description of the Application

Active Stage

reduceByKey at /root/wordcount.py:23

Pending Stage

takeOrdered at /root/wordcount.py:26

Running Driver -

stderr log page for driver-20160713130051-0025 

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.

However in my case - Slave 1 is working

According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere

Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI

According to TaskSchedulerImpl: Initial job has not accepted any resources; I assigned

~/spark-1.5.0/conf/spark-env.sh

Spark Environment Variables

SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2

Replicated those across the Slaves

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.

Edited July 18,2016

Wordcount.py - My PySpark application code -

from pyspark import SparkContext, SparkConf

logFile = "/user/root/In/a.txt"

conf = (SparkConf().set("num-executors", "1"))

sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")

Error

Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

According to Spark UI showing 0 cores even when setting cores in App,

Spark WebUI states zero cores used and indefinite wait no tasks running. The application is also using NO MEMORY whatsoever during run time or cores and immediately hits a status of waiting when starting

Spark version 1.6.1 Ubuntu Amazon EC2

解决方案

I also have the same issue. Below are my remarks when it occurs.

1:17:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I noticed that it only occurs during the first query from scala shell where I run something fetching data from hdfs.

When the problem occurs, the webui states that there's not any running applications.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed 
Status: ALIVE

It seems that something fails to start , I can't tell exactly which it is.

However restarting the cluster a second time sets the Applications value to 1 and everything works well.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

I'm still investigate, this quick workaround can save times till final solution.

这篇关于已提交Spark作业-等待中(TaskSchedulerImpl:不接受初始作业)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆