缩小规模时,防止杀死一些吊舱吗? [英] Prevent killing some pods when scaling down possible?

查看:46
本文介绍了缩小规模时,防止杀死一些吊舱吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要扩展一组运行基于队列的工作程序的容器.工人的工作可以长时间(数小时)运行,并且不会被打断.窗格的数量基于工作队列的长度.缩放可以是使用具有自定义指标的水平自动缩放器,也可以是用于更改副本数量的简单控制器.

I need to scale a set of pods that run queue-based workers. Jobs for workers can run for a long time (hours) and should not get interrupted. The number of pods is based on the length of the worker queue. Scaling would be either using the horizontal autoscaler using custom metrics, or a simple controller that changes the number of replicas.

任何一种解决方案的问题是,按比例缩小时,无法控制终止哪个Pod.在任何给定的时间,大多数工人很可能从事短期工作,闲置或(长期工作)处理长期工作.我想避免杀死长期工作的工作人员,可以将闲置或短期工作的工作人员终止而不会出现问题.

Problem with either solution is that, when scaling down, there is no control over which pod(s) get terminated. At any given time, most workers are likely working on short running jobs, idle, or (more rare) processing a long running job. I'd like to avoid killing the long running job workers, idle or short running job workers can be terminated without issue.

如何以低复杂度做到这一点?我能想到的一件事是基于Pod的CPU使用率来执行此操作.不理想,但是可能已经足够了.另一种方法可能是,工作人员以某种方式公开了一个优先级,该优先级指示他们是否是要删除的首选吊舱.但是,每次工人找到新工作时,此优先级可能会发生变化.

What would be a way to do this with low complexity? One thing I can think of is to do this based on CPU usage of the pods. Not ideal, but it could be good enough. Another method could be that workers somehow expose a priority indicating whether they are the preferred pod to be deleted. This priority could change every time a worker picks up a new job though.

最终,所有工作都将是短暂的,这个问题将消失,但这是目前的长期目标.

Eventually all jobs will be short running and this problem will go away, but that is a longer term goal for now.

推荐答案

吊舱终止过程,Kubernetes将SIGTERM信号发送到吊舱的容器.您可以使用该信号来正常关闭应用程序.问题是Kubernetes不会永远等待您的应用程序完成,并且在您这种情况下,您的应用程序可能需要很长时间才能退出.
在这种情况下,建议您使用 preStop挂钩,在Kubernetes将KILL信号发送到容器之前完成. 此处使用处理程序:

During the process of termination of a pod, Kubernetes sends a SIGTERM signal to the container of your pod. You can use that signal to gracefully shutdown your app. The problem is that Kubernetes does not wait forever for your application to finish and in your case your app may take a long time to exit.
In this case I recommend you use a preStop hook, which is completed before Kubernetes sends the KILL signal to the container. There is an example here on how to use handlers:

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]

这篇关于缩小规模时,防止杀死一些吊舱吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆