如何确保我的cronjob作业不会因失败而重试? [英] How do I make sure my cronjob job does NOT retry on failure?

查看:363
本文介绍了如何确保我的cronjob作业不会因失败而重试?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在GKE上运行并运行Cucumber JVM测试的Kubernetes Cronjob.如果由于断言失败,某些资源不可用等导致Step失败,则Cucumber正确抛出一个异常,该异常导致Cronjob作业失败,并且Kubernetes窗格的状态更改为ERROR.这将导致创建一个新的Pod,该Pod尝试再次运行相同的Cucumber测试,然后再次失败并重试.

我不希望发生任何这些重试.如果Cronjob作业失败,我希望它保持失败状态并且完全不重试.基于,我已经尝试过结合restartPolicy: NeverconcurrencyPolicy: Forbid一起设置backoffLimit: 0,但是它仍然通过创建新的Pod并再次运行测试来重试.

我想念什么?这是Cronjob的kube清单:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: quality-apatha
  namespace: default
  labels:
    app: quality-apatha
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
            - name: quality-apatha
              image: FOO-IMAGE-PATH
              imagePullPolicy: "Always"
              resources:
                limits:
                  cpu: 500m
                  memory: 512Mi
              env:
                - name: FOO
                  value: BAR
              volumeMounts:
                - name: FOO
                  mountPath: BAR
              args:
                - java
                - -cp
                - qe_java.job.jar:qe_java-1.0-SNAPSHOT-tests.jar
                - org.junit.runner.JUnitCore
                - com.liveramp.qe_java.RunCucumberTest
          restartPolicy: Never
          volumes:
            - name: FOO
              secret:
                secretName: BAR

还有其他可用来停止重试的Kubernetes吗?

谢谢!

解决方案

为使事情尽可能简单,我使用),因此我认为这不是重试.

让我们仔细看下面的示例(基于yaml可用的

它会根据schedule生成新的cron作业every 60 seconds,无论它是否失败或运行成功.在此特定示例中,它被配置为在尝试运行non-existing-command时失败.

您可以通过运行以下命令查看正在发生的情况:

$ kubectl get pods
NAME                     READY   STATUS              RESTARTS   AGE
hello-1587558720-pgqq9   0/1     Error               0          61s
hello-1587558780-gpzxl   0/1     ContainerCreating   0          1s

如您所见,没有重试.尽管第一个Pod失败了,但根据我们的规范,恰好在60秒后生成了一个新的Pod.我想再次强调一下. 这不是重试.

另一方面,当我们修改上述示例并设置backoffLimit: 3时,我们可以观察到重试.如您所见,现在创建新的Pods 的频率比每60秒要多. 这是重试.

$ kubectl get pods
NAME                     READY   STATUS   RESTARTS   AGE
hello-1587565260-7db6j   0/1     Error    0          106s
hello-1587565260-tcqhv   0/1     Error    0          104s
hello-1587565260-vnbcl   0/1     Error    0          94s
hello-1587565320-7nc6z   0/1     Error    0          44s
hello-1587565320-l4p8r   0/1     Error    0          14s
hello-1587565320-mjnb6   0/1     Error    0          46s
hello-1587565320-wqbm2   0/1     Error    0          34s

我们在上面看到的是 3次重试(Pod创建尝试),与hello-1587565260 job 4次重试(包括与hello-1587565320 工作相关的原始第一次尝试(不计入backoffLimit: 3).

如您所见,作业本身仍按计划运行,以60秒为间隔:

kubectl get jobs
NAME               COMPLETIONS   DURATION   AGE
hello-1587565260   0/1           2m12s      2m12s
hello-1587565320   0/1           72s        72s
hello-1587565380   0/1           11s        11s

但是,由于我们将backoffLimit的时间设置为3,因此,每次负责运行作业的Pod失败时,都会发生 3次其他重试.

我希望这有助于消除在 kubernetes 中运行cronJobs的任何可能的困惑.

如果您只想一次运行一次而不是定期运行,请看一下简单的 Cron 配置基准,但可以说是每24小时一次,而不是每分钟一次.

I have a Kubernetes Cronjob that runs on GKE and runs Cucumber JVM tests. In case a Step fails due to assertion failure, some resource being unavailable, etc., Cucumber rightly throws an exception which leads the Cronjob job to fail and the Kubernetes pod's status changes to ERROR. This leads to creation of a new pod that tries to run the same Cucumber tests again, which fails again and retries again.

I don't want any of these retries to happen. If a Cronjob job fails, I want it to remain in the failed status and not retry at all. Based on this, I have already tried setting backoffLimit: 0 in combination with restartPolicy: Never in combination with concurrencyPolicy: Forbid, but it still retries by creating new pods and running the tests again.

What am I missing? Here's my kube manifest for the Cronjob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: quality-apatha
  namespace: default
  labels:
    app: quality-apatha
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
            - name: quality-apatha
              image: FOO-IMAGE-PATH
              imagePullPolicy: "Always"
              resources:
                limits:
                  cpu: 500m
                  memory: 512Mi
              env:
                - name: FOO
                  value: BAR
              volumeMounts:
                - name: FOO
                  mountPath: BAR
              args:
                - java
                - -cp
                - qe_java.job.jar:qe_java-1.0-SNAPSHOT-tests.jar
                - org.junit.runner.JUnitCore
                - com.liveramp.qe_java.RunCucumberTest
          restartPolicy: Never
          volumes:
            - name: FOO
              secret:
                secretName: BAR

Is there any other Kubernetes Kind I can use to stop the retrying?

Thank you!

解决方案

To make things as simple as possible I tested it using this example from the official kubernetes documentation, applying to it minor modifications to illustrate what really happens in different scenarios.

I can confirm that when backoffLimit is set to 0 and restartPolicy to Never everything works exactly as expected and there are no retries. Note that every single run of your Job which in your example is scheduled to run at intervals of 60 seconds (schedule: "*/1 * * * *") IS NOT considerd a retry.

Let's take a closer look at the following example (base yaml avialable here):

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - non-existing-command
          restartPolicy: Never

It spawns new cron job every 60 seconds according to the schedule, no matter if it fails or runs successfully. In this particular example it is configured to fail as we are trying to run non-existing-command.

You can check what's happening by running:

$ kubectl get pods
NAME                     READY   STATUS              RESTARTS   AGE
hello-1587558720-pgqq9   0/1     Error               0          61s
hello-1587558780-gpzxl   0/1     ContainerCreating   0          1s

As you can see there are no retries. Although the first Pod failed, the new one is spawned exactly 60 seconds later according to our specification. I'd like to emphasize it again. This is not a retry.

On the other hand when we modify the above example and set backoffLimit: 3, we can observe the retries. As you can see, now new Pods are created much more often than every 60 seconds. This are retries.

$ kubectl get pods
NAME                     READY   STATUS   RESTARTS   AGE
hello-1587565260-7db6j   0/1     Error    0          106s
hello-1587565260-tcqhv   0/1     Error    0          104s
hello-1587565260-vnbcl   0/1     Error    0          94s
hello-1587565320-7nc6z   0/1     Error    0          44s
hello-1587565320-l4p8r   0/1     Error    0          14s
hello-1587565320-mjnb6   0/1     Error    0          46s
hello-1587565320-wqbm2   0/1     Error    0          34s

What we can see above are 3 retries (Pod creation attempts), related with hello-1587565260 job and 4 retries (including the orignal 1st try not counted in backoffLimit: 3) related with hello-1587565320 job.

As you can see the jobs themselves are still run according to the schedule, at 60 second intervals:

kubectl get jobs
NAME               COMPLETIONS   DURATION   AGE
hello-1587565260   0/1           2m12s      2m12s
hello-1587565320   0/1           72s        72s
hello-1587565380   0/1           11s        11s

However due to our backoffLimit set this time to 3, every time the Pod responsible for running the job fails, 3 additional retries occur.

I hope this helped to dispel any possible confusions about running cronJobs in kubernetes.

If you are rather interested in running something just once, not at regular intervals, take a look at simple Job instead of CronJob.

Also consider changing your Cron configuration if you still want to run this particular job on regular basis but let's say once in 24 h, not every minute.

这篇关于如何确保我的cronjob作业不会因失败而重试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆