重试一定次数后如何使(cron)工作失败? [英] How to fail a (cron) job after a certain number of retries?

查看:151
本文介绍了重试一定次数后如何使(cron)工作失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们建立了一个Kubernetes集群,其中包含Web抓取Cron作业.在cron作业开始失败之前,一切似乎都进行得很好(例如,当网站结构发生变化且我们的抓取工具不再起作用时).看起来偶尔会有一些失败的cron作业会继续重试,直到导致集群崩溃.运行kubectl get cronjobs(在群集故障之前)将显示正在运行的作业太多,而该作业失败.

We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs (prior to a cluster failure) will show too many jobs running for a failing job.

我尝试遵循此处关于pod退避失败政策的已知问题;但是,这似乎不起作用.

I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.

这是我们的配置供参考:

Here is our config for reference:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: scrape-al
spec:
  schedule: '*/15 * * * *'
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 0
  successfulJobsHistoryLimit: 0
  jobTemplate:
    metadata:
      labels:
        app: scrape
        scrape: al
    spec:
      template:
        spec:
          containers:
            - name: scrape-al
              image: 'govhawk/openstates:1.3.1-beta'
              command:
                - /opt/openstates/openstates/pupa-scrape.sh
              args:
                - al bills --scrape
          restartPolicy: Never
      backoffLimit: 3

理想情况下,我们希望在N次重试后终止cron作业(例如,在my-cron-job失败5次后,类似kubectl delete cronjob my-cron-job之类的东西).任何想法或建议将不胜感激.谢谢!

Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job after my-cron-job has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!

推荐答案

您可以使用backoffLimit告诉作业停止重试.

You can tell your Job to stop retrying using backoffLimit.

指定标记此作业失败之前的重试次数.

Specifies the number of retries before marking this job failed.

以您的情况

spec:
  template:
    spec:
      containers:
        - name: scrape-al
          image: 'govhawk/openstates:1.3.1-beta'
          command:
            - /opt/openstates/openstates/pupa-scrape.sh
          args:
            - al bills --scrape
      restartPolicy: Never
  backoffLimit: 3

您将作业的3设置为backoffLimit.这意味着当CronJob创建作业时,如果失败,它将重试3次.这控制着Job,而不是CronJob

You set 3 asbackoffLimit of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob

作业失败时,将再次创建另一个作业作为您的计划时间.

When Job is failed, another Job will be created again as your scheduled period.

您要: 如果我没看错,那么当您计划的作业失败5次时,您要停止计划新作业.对吧?

You want: If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?

答案: 在这种情况下,这不可能自动.

Answer: In that case, this is not possible automatically.

可能的解决方案: 您需要暂停 CronJob,以便它停止调度新作业

Possible solution: You need to suspend CronJob so than it stop scheduling new Job.

Suspend: true

您可以手动执行此操作.如果您不想手动执行此操作,则需要设置一个监视程序,该监视程序将监视您的CronJob状态,并在必要时将CronJob更新为挂起.

You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.

这篇关于重试一定次数后如何使(cron)工作失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆