Spark应用程序状态中的FAILED和ERROR有什么区别 [英] What is the difference between FAILED AND ERROR in spark application states

查看:279
本文介绍了Spark应用程序状态中的FAILED和ERROR有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建已提交的spark应用程序的状态图。我以及那种在什么时候认为应用程序失败的迷失。



状态从这里:

解决方案

此阶段非常重要,因为对于大数据,Spark是很棒,但是让我们面对现实吧,我们还没有解决问题!






当任务/作业失败时,Spark重新启动它(回想一下Spark提供的主要抽象RDD是 Resilient Distributed Dataset,它



我使用Spark 1.6.2,我的集群在3次重新启动作业/任务时



例如,我最近的工作之一必须重新启动整个阶段:





在集群/应用程序中,可以看到尝试ID,这里的应用程序位于第3个也是最后一个尝试:





如果该尝试被标记为失败(无论出于何种原因,例如内存不足,DNS错误,GC分配内存不足,磁盘出现故障,节点未响应4个心跳(可能已关闭)等),然后Spark重新启动该作业。


I am trying to create a state diagram of a submitted spark application. I and kind of lost on when then an application is considered FAILED.

States are from here: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/core/src/main/scala/org/apache/spark/deploy/master/DriverState.scala

解决方案

This stage is very important, since when it comes to Big Data, Spark is awesome, but let's face it, we haven't solve the problem yet!


When a task/job fails, Spark restarts it (recall that the RDD, the main abstraction Spark provides, is a Resilient Distributed Dataset, which is not what we are looking for here, but it would give the intuition).

I use Spark 1.6.2 and my cluster restarts the job/task 3 times, when it is marked as FAILED.

For example, one of my recent jobs had to restart a whole stage:

In the cluster/app, one can see the attempt IDs, here the application is in its 3rd and final attempt:

If that attempt is marked as FAILED (for whatever reason, e.g. out-of-memory, bad DNS, GC allocation memory, disk failed, node didn't respond to the 4 heartbeats (probably is down), etc.), then Spark relaunches the job.

这篇关于Spark应用程序状态中的FAILED和ERROR有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆