如何确保我的cronjob作业不会因失败而重试? [英] How do I make sure my cronjob job does NOT retry on failure?
问题描述
我有一个在GKE上运行并运行Cucumber JVM测试的Kubernetes Cronjob.如果由于断言失败,某些资源不可用等导致Step失败,则Cucumber正确抛出一个异常,该异常导致Cronjob作业失败,并且Kubernetes窗格的状态更改为ERROR
.这将导致创建一个新的Pod,该Pod尝试再次运行相同的Cucumber测试,然后再次失败并重试.
我不希望发生任何这些重试.如果Cronjob作业失败,我希望它保持失败状态并且完全不重试.基于此,我已经尝试过结合restartPolicy: Never
和concurrencyPolicy: Forbid
一起设置backoffLimit: 0
,但是它仍然通过创建新的Pod并再次运行测试来重试.
我想念什么?这是Cronjob的kube清单:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: quality-apatha
namespace: default
labels:
app: quality-apatha
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: quality-apatha
image: FOO-IMAGE-PATH
imagePullPolicy: "Always"
resources:
limits:
cpu: 500m
memory: 512Mi
env:
- name: FOO
value: BAR
volumeMounts:
- name: FOO
mountPath: BAR
args:
- java
- -cp
- qe_java.job.jar:qe_java-1.0-SNAPSHOT-tests.jar
- org.junit.runner.JUnitCore
- com.liveramp.qe_java.RunCucumberTest
restartPolicy: Never
volumes:
- name: FOO
secret:
secretName: BAR
还有其他可用来停止重试的Kubernetes吗?
谢谢!
为使事情尽可能简单,我使用),因此我认为这不是重试.>
让我们仔细看下面的示例(基于 它会根据 您可以通过运行以下命令查看正在发生的情况: 如您所见,没有重试.尽管第一个 另一方面,当我们修改上述示例并设置 我们在上面看到的是 3次重试( 如您所见,作业本身仍按计划运行,以60秒为间隔: 但是,由于我们将 我希望这有助于消除在 kubernetes 中运行 如果您只想一次运行一次而不是定期运行,请看一下简单的 Cron 配置基准,但可以说是每24小时一次,而不是每分钟一次. I have a Kubernetes Cronjob that runs on GKE and runs Cucumber JVM tests. In case a Step fails due to assertion failure, some resource being unavailable, etc., Cucumber rightly throws an exception which leads the Cronjob job to fail and the Kubernetes pod's status changes to I don't want any of these retries to happen. If a Cronjob job fails, I want it to remain in the failed status and not retry at all. Based on this, I have already tried setting What am I missing? Here's my kube manifest for the Cronjob: Is there any other Kubernetes Thank you! To make things as simple as possible I tested it using this example from the official kubernetes documentation, applying to it minor modifications to illustrate what really happens in different scenarios. I can confirm that when Let's take a closer look at the following example (base It spawns new cron job You can check what's happening by running: As you can see there are no retries. Although the first On the other hand when we modify the above example and set What we can see above are 3 retries ( As you can see the jobs themselves are still run according to the schedule, at 60 second intervals: However due to our I hope this helped to dispel any possible confusions about running If you are rather interested in running something just once, not at regular intervals, take a look at simple Job instead of Also consider changing your Cron configuration if you still want to run this particular job on regular basis but let's say once in 24 h, not every minute. 这篇关于如何确保我的cronjob作业不会因失败而重试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!yaml
可用的schedule
生成新的cron作业every 60 seconds
,无论它是否失败或运行成功.在此特定示例中,它被配置为在尝试运行non-existing-command
时失败.$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587558720-pgqq9 0/1 Error 0 61s
hello-1587558780-gpzxl 0/1 ContainerCreating 0 1s
Pod
失败了,但根据我们的规范,恰好在60秒后生成了一个新的Pod
.我想再次强调一下. 这不是重试. backoffLimit: 3
时,我们可以观察到重试.如您所见,现在创建新的Pods
的频率比每60秒要多. 这是重试. $ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587565260-7db6j 0/1 Error 0 106s
hello-1587565260-tcqhv 0/1 Error 0 104s
hello-1587565260-vnbcl 0/1 Error 0 94s
hello-1587565320-7nc6z 0/1 Error 0 44s
hello-1587565320-l4p8r 0/1 Error 0 14s
hello-1587565320-mjnb6 0/1 Error 0 46s
hello-1587565320-wqbm2 0/1 Error 0 34s
Pod
创建尝试),与hello-1587565260
job 和 4次重试(包括与hello-1587565320
工作相关的原始第一次尝试(不计入backoffLimit: 3
).kubectl get jobs
NAME COMPLETIONS DURATION AGE
hello-1587565260 0/1 2m12s 2m12s
hello-1587565320 0/1 72s 72s
hello-1587565380 0/1 11s 11s
backoffLimit
的时间设置为3
,因此,每次负责运行作业的Pod
失败时,都会发生 3次其他重试.cronJobs
的任何可能的困惑.ERROR
. This leads to creation of a new pod that tries to run the same Cucumber tests again, which fails again and retries again.backoffLimit: 0
in combination with restartPolicy: Never
in combination with concurrencyPolicy: Forbid
, but it still retries by creating new pods and running the tests again. apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: quality-apatha
namespace: default
labels:
app: quality-apatha
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: quality-apatha
image: FOO-IMAGE-PATH
imagePullPolicy: "Always"
resources:
limits:
cpu: 500m
memory: 512Mi
env:
- name: FOO
value: BAR
volumeMounts:
- name: FOO
mountPath: BAR
args:
- java
- -cp
- qe_java.job.jar:qe_java-1.0-SNAPSHOT-tests.jar
- org.junit.runner.JUnitCore
- com.liveramp.qe_java.RunCucumberTest
restartPolicy: Never
volumes:
- name: FOO
secret:
secretName: BAR
Kind
I can use to stop the retrying?backoffLimit
is set to 0
and restartPolicy
to Never
everything works exactly as expected and there are no retries. Note that every single run of your Job
which in your example is scheduled to run at intervals of 60 seconds (schedule: "*/1 * * * *"
) IS NOT considerd a retry.yaml
avialable here):apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- non-existing-command
restartPolicy: Never
every 60 seconds
according to the schedule
, no matter if it fails or runs successfully. In this particular example it is configured to fail as we are trying to run non-existing-command
.$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587558720-pgqq9 0/1 Error 0 61s
hello-1587558780-gpzxl 0/1 ContainerCreating 0 1s
Pod
failed, the new one is spawned exactly 60 seconds later according to our specification. I'd like to emphasize it again. This is not a retry.backoffLimit: 3
, we can observe the retries. As you can see, now new Pods
are created much more often than every 60 seconds. This are retries.$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587565260-7db6j 0/1 Error 0 106s
hello-1587565260-tcqhv 0/1 Error 0 104s
hello-1587565260-vnbcl 0/1 Error 0 94s
hello-1587565320-7nc6z 0/1 Error 0 44s
hello-1587565320-l4p8r 0/1 Error 0 14s
hello-1587565320-mjnb6 0/1 Error 0 46s
hello-1587565320-wqbm2 0/1 Error 0 34s
Pod
creation attempts), related with hello-1587565260
job and 4 retries (including the orignal 1st try not counted in backoffLimit: 3
) related with hello-1587565320
job.kubectl get jobs
NAME COMPLETIONS DURATION AGE
hello-1587565260 0/1 2m12s 2m12s
hello-1587565320 0/1 72s 72s
hello-1587565380 0/1 11s 11s
backoffLimit
set this time to 3
, every time the Pod
responsible for running the job fails, 3 additional retries occur.cronJobs
in kubernetes.CronJob
.