Kubernetes - Liveness 和 Readiness 探针实现 [英] Kubernetes - Liveness and Readiness probe implementation

查看:69
本文介绍了Kubernetes - Liveness 和 Readiness 探针实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spring 开发一项服务并将其部署在 OpenShift 上.目前,我正在使用 Spring Actuator 运行状况端点作为 Kubernetes 的活跃度和就绪度探测器.

I'm developing a service using Spring and deploying it on OpenShift. Currently I'm using Spring Actuator health endpoint to serve as a liveness and readiness probe for Kubernetes.

但是,我将在 Actuator 运行状况端点中添加对另一个服务的调用,在我看来,在这种情况下,我需要为我的服务实施新的活跃度探测.如果我不这样做,那么第二个服务的失败将导致活性探测失败,Kubernetes 将在没有任何真正需要的情况下重新启动我的服务.

However, I will add a call to another service in a Actuator health endpoint, and it looks to me that in that case I need to implement new liveness probe for my service. If I don't do that then a failure in a second service will result with a failure in liveness probe failing and Kubernetes will restart my service without any real need.

对于活性探测,实现一些总是返回 HTTP 状态 200 的简单 REST 控制器是否可以?如果它有效,该服务总是可以被认为是活着的吗?或者有没有更好的方法来做到这一点?

Is it OK, for a liveness probe, to implement some simple REST controller which will always return HTTP status 200? If it works, the service can always be considered as alive? Or is there any better way to do it?

推荐答案

Liveness Probe

仅包括那些您认为如果失败将通过 pod 重启解决的检查.拥有一个总是返回 HTTP 200 的新端点并没有错,该端点将用作活性探测端点;前提是您对第一个服务所依赖的其他服务有独立的监控和警报.

Liveness Probe

Include only those checks which you think, if fails, will get cured with a pod restart. There is nothing wrong in having a new endpoint that always return an HTTP 200, which will serve as a liveness probe endpoint; provided you have an independent monitoring and alert in place for other services on which your first service depends on.

简单的 http 200 liveness 有什么帮助?

好吧,让我们考虑这些例子.

Well, let's consider these examples.

  1. 如果您的应用程序是一个线程每个 http 请求的应用程序(基于 servlet 的应用程序 - 类似于在 tomcat 上运行的应用程序 - 这是 spring boot 1.X 的默认选择),在重负载的情况下它可能会变得没有反应.重新启动 pod 会有所帮助.

  1. If your application is a one-thread-per-http-request application (servlet based application - like application runs on tomcat - which is spring boot 1.X's default choice), in the case of heavy-load it may become unresponsive. A pod-restart will help here.

如果您在启动应用程序时没有配置内存;在重载的情况下,应用程序可能会超过 pod 分配的内存,应用程序可能会变得无响应.pod-restart 在这里也有帮助.

If you don't have memory configured while you starts your application; in case of heavy-load, application may outrun the pod's allocated memory and app may become unresponsive. A pod-restart will help here too.

准备情况调查

它有两个方面.

1) 让我们考虑一个场景.假设您的第二个服务启用了身份验证.您的第一个服务(您的健康检查所在的位置)必须正确配置才能使用第二个服务进行身份验证.

1) Let's consider a scenario. Lets say, authentication is enabled on your second service. Your first service (where your health check is) has to be configured properly to authenticate with the second service.

让我们假设,在您的第一个服务的后续部署中,您搞砸了您应该从配置映射或机密中读取的 authheader 变量名称.而且您正在进行滚动更新.

Let's just say, in a subsequent deployment of your 1st service, you screwed up authheader variable name which you were supposed to read from the configmap or secret. And you are doing a rolling update.

如果第二个服务的 http200 也包含在(第一个服务的)健康检查中,那么这将阻止部署的错误版本上线;您的旧版本将继续运行,因为您的新版本将永远无法通过运行状况检查.我们甚至可能不需要那么复杂的身份验证等等,假设第二个服务的 url 硬编码在第一个服务中,而您在第一个服务的后续版本中搞砸了该 url.健康检查中的这项额外检查将阻止有错误的版本上线

If you have the second service's http200 also included in the health check (of the 1st service) then that will prevent the screwed-up version of the deployment from going live; your old version will keep running because your newer version will never make it across the health-check. We may not even need to go that complicated to authentication and all, let's just say url of the second service is hard coded in the first service, and you screwed up that url in a subsequent release of your first service. This additional check in your health-check will prevent the buggy version from going live

2) 另一方面,假设您的第一个服务具有许多其他功能,并且第二个服务关闭几个小时不会影响第一个服务提供的任何重要功能.然后,您可以通过第一个服务的健康检查选择退出第二个服务的活跃度.

2) On the other hand, Let's assume that your first service has numerous other functionalities and this second service being down for a few hours will not affect any significant functionality that first service offers. Then, by all means you can opt out of the second service's liveness from first service's health check.

无论哪种方式,您都需要为这两个服务设置适当的警报和监控.这将有助于决定人类何时应该进行干预.

Either way, you need to set up proper alerting and monitoring for both the services. This will help to decide when humans should intervene.

我会做的是(忽略其他不相关的细节),

What I would do is (ignore other irrelevant details),

readinessProbe:
  httpGet:
    path: </Actuator-healthcheck-endpoint>
    port: 8080
  initialDelaySeconds: 120
  timeoutSeconds: 5
livenessProbe:
  httpGet:
    path: </my-custom-endpoint-which-always-returns200>
    port: 8080
  initialDelaySeconds: 130
  timeoutSeconds: 10
  failureThreshold: 10

这篇关于Kubernetes - Liveness 和 Readiness 探针实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆