spark.task.maxFailures无法按预期工作 [英] spark.task.maxFailures not working as expected

查看:721
本文介绍了spark.task.maxFailures无法按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行spark.task.maxFailures设置为1的Spark作业,并根据

I am running a Spark job with spark.task.maxFailures set to 1, and according to the official documentation:

spark.task.maxFailures

放弃工作之前单个任务失败的次数.应该大于或等于1.允许的重试次数=此值-1.

Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

因此,一旦任务失败,我的工作就会失败...但是,它正在尝试第二次放弃.我想念什么吗?我已经在运行时检查了属性值,以防万一,并将其​​正确设置为1.对于我来说,它在最后一步失败,因此第一次尝试创建输出目录,而第二次尝试始终失败,因为输出目录已经存在,并不是真的有帮助.

So my job should fail as soon as a task fails... However, it is trying a second time before giving up. Am I missing something? I have checked the property value in runtime just in case, and it is correctly set to 1. In my case, it fails in the last step, so the first attempt creates the output directory and the second one always fails because the output directory already exists, which is not really helpful.

此属性中是否存在某种错误或说明文件有误?

Is there some kind of bug in this property or is the documentation wrong?

推荐答案

这是允许的单个任务失败的数量,但是您所描述的听起来像是实际作业失败并被重试.

That is the number of individual task failures that are allowed, but what you are describing sounds like the actual job failing and being retried.

如果使用YARN运行此作业,则可能会多次重新提交作业本身,请参见yarn.resourcemanager.am.max-attempts.如果是这样,您可以将此设置调低为1.

If you're running this with YARN, the job itself could be being resubmitted multiple times, see yarn.resourcemanager.am.max-attempts. If so, you could turn this setting down to 1.

这篇关于spark.task.maxFailures无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆