Kubernetes上的批处理 [英] Batch Processing on Kubernetes

查看:86
本文介绍了Kubernetes上的批处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里的任何人都具有在kubernetes上进行批处理(例如spring batch)的经验吗?这是个好主意吗?如果我们使用kubernetes自动缩放功能,如何防止批处理处理相同的数据?谢谢.

Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ? How to prevent batch processing process same data if we use kubernetes auto scaling feature ? Thank you.

推荐答案

这里的任何人都具有在kubernetes上进行批处理(例如spring batch)的经验吗?这是个好主意吗?

Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ?

对于Spring Batch,我们(Spring Batch团队)在此事上确实有一些经验,我们将在以下讲座中分享这些经验:

For Spring Batch, we (the Spring Batch team) do have some experience on the matter which we share in the following talks:

  • Cloud Native Batch Processing on Kubernetes, by Michael Minella
  • Spring Batch on Kubernetes, by me.

在kubernetes上运行批处理作业可能很棘手:

Running batch jobs on kubernetes can be tricky:

  • 在处理过程中,可以通过k8在不同节点上重新安排Pod的行程
  • cron作业可能会被两次触发

这需要在开发人员方面进行额外的琐碎工作,以确保批处理应用程序具有容错能力(可抵抗节点故障,pod重新调度等),并可以防止在集群环境中重复执行作业.

This requires additional non-trivial work on the developer's side to make sure the batch application is fault-tolerant (resilient to node failure, pod re-scheduling, etc) and safe against duplicate job execution in a clustered environment.

Spring Batch为您处理了这些额外的工作,由于以下几个原因,它是在k8上运行批处理工作负载的一个不错的选择:

Spring Batch takes care of this additional work for you and can be a good choice to run batch workloads on k8s for several reasons:

  • 成本效率::Spring Batch作业在外部数据库中保持其状态,这样就可以在作业/节点失败的情况下从上一个保存点重新启动它们或广告连播重新安排
  • 稳健性:借助集中的作业存储库,可以防止重复执行作业
  • 容错性::在出现短暂错误(例如对Web服务的调用可能暂时中断或在云环境中重新计划)的情况下,重试/跳过失败的项目
  • Cost efficiency: Spring Batch jobs maintain their state in an external database, which makes it possible to restart them from the last save point in case of job/node failure or pod re-scheduling
  • Robustness: Safe against duplicate job executions thanks to a centralized job repository
  • Fault-tolerance: Retry/Skip failed items in case of transient errors like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment

我写了一篇博客文章,其中通过代码示例详细解释了所有这些方面.您可以在这里找到它:

I wrote a blog post in which I explain all these aspects in details with code examples. You can find it here: Spring Batch on Kubernetes: Efficient batch processing at scale

如果我们使用kubernetes自动缩放功能,如何防止批处理处理相同的数据?

How to prevent batch processing process same data if we use kubernetes auto scaling feature ?

使每个作业过程具有不同的数据集是一种方法(例如,每个文件有一个作业).但是您可能会对不同的模式感兴趣,请参见

Making each job process a different data set is the way to go (a job per file for example). But there are different patterns that you might be interested in, see Job Patterns from k8s docs.

这篇关于Kubernetes上的批处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆