spark.task.maxFailures无法按预期工作 [英] spark.task.maxFailures not working as expected
问题描述
我正在运行spark.task.maxFailures
设置为1的Spark作业,并根据
I am running a Spark job with spark.task.maxFailures
set to 1, and according to the official documentation:
spark.task.maxFailures
放弃工作之前单个任务失败的次数.应该大于或等于1.允许的重试次数=此值-1.
Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.
因此,一旦任务失败,我的工作就会失败...但是,它正在尝试第二次放弃.我想念什么吗?我已经在运行时检查了属性值,以防万一,并将其正确设置为1.对于我来说,它在最后一步失败,因此第一次尝试创建输出目录,而第二次尝试始终失败,因为输出目录已经存在,并不是真的有帮助.
So my job should fail as soon as a task fails... However, it is trying a second time before giving up. Am I missing something? I have checked the property value in runtime just in case, and it is correctly set to 1. In my case, it fails in the last step, so the first attempt creates the output directory and the second one always fails because the output directory already exists, which is not really helpful.
此属性中是否存在某种错误或说明文件有误?
Is there some kind of bug in this property or is the documentation wrong?
推荐答案
这是允许的单个任务失败的数量,但是您所描述的听起来像是实际作业失败并被重试.
That is the number of individual task failures that are allowed, but what you are describing sounds like the actual job failing and being retried.
如果使用YARN运行此作业,则可能会多次重新提交作业本身,请参见yarn.resourcemanager.am.max-attempts
.如果是这样,您可以将此设置调低为1.
If you're running this with YARN, the job itself could be being resubmitted multiple times, see yarn.resourcemanager.am.max-attempts
. If so, you could turn this setting down to 1.
这篇关于spark.task.maxFailures无法按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!