Kubernetes-活力和就绪性探针实现 [英] Kubernetes - Liveness and Readiness probe implementation

查看:81
本文介绍了Kubernetes-活力和就绪性探针实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spring开发服务并将其部署在OpenShift上.目前,我正在使用Spring Actuator运行状况终结点来充当Kubernetes的活跃性和就绪状态探测器.

I'm developing a service using Spring and deploying it on OpenShift. Currently I'm using Spring Actuator health endpoint to serve as a liveness and readiness probe for Kubernetes.

但是,我将在Actuator运行状况端点中添加对另一个服务的调用,在我看来,在这种情况下,我需要为我的服务实现新的活动性探针.如果我不这样做,那么第二项服务的失败将导致活动探针失败,并且Kubernetes将在没有任何实际需要的情况下重新启动我的服务.

However, I will add a call to another service in a Actuator health endpoint, and it looks to me that in that case I need to implement new liveness probe for my service. If I don't do that then a failure in a second service will result with a failure in liveness probe failing and Kubernetes will restart my service without any real need.

对于活动性探针,是否可以实现一些简单的REST控制器(始终返回HTTP状态200),可以吗?如果有效,该服务始终可以被认为是有效的吗?还是有更好的方法呢?

Is it OK, for a liveness probe, to implement some simple REST controller which will always return HTTP status 200? If it works, the service can always be considered as alive? Or is there any better way to do it?

推荐答案

活力探针

仅包括您认为如果失败将通过pod重启而治愈的那些检查.拥有一个始终返回HTTP 200的新终结点并没有什么错,该终结点将用作活动探测终结点;只要您对第一项服务所依赖的其他服务具有独立的监视和警报.

Liveness Probe

Include only those checks which you think, if fails, will get cured with a pod restart. There is nothing wrong in having a new endpoint that always return an HTTP 200, which will serve as a liveness probe endpoint; provided you have an independent monitoring and alert in place for other services on which your first service depends on.

简单的http 200活动在哪里有帮助?

好吧,让我们考虑一下这些例子.

Well, let's consider these examples.

  1. 如果您的应用程序是每个HTTP请求一个线程的应用程序(基于servlet的应用程序-例如在tomcat上运行的应用程序-这是spring boot 1.X的默认选择),则在负载较重的情况下它可能会变得反应迟钝.重新启动广告连播会有所帮助.

  1. If your application is a one-thread-per-http-request application (servlet based application - like application runs on tomcat - which is spring boot 1.X's default choice), in the case of heavy-load it may become unresponsive. A pod-restart will help here.

如果在启动应用程序时未配置内存;如果负载很重,应用程序可能会超出Pod分配的内存,并且应用程序可能会变得无响应.重启pod也可以帮助您.

If you don't have memory configured while you starts your application; in case of heavy-load, application may outrun the pod's allocated memory and app may become unresponsive. A pod-restart will help here too.

准备情况调查

它有两个方面.

Readiness Probe

There are 2 aspects to it.

1)让我们考虑一个场景.可以说,您的第二项服务已启用身份验证.您的第一个服务(运行状况检查所在的位置)必须正确配置才能与第二个服务进行身份验证.

1) Let's consider a scenario. Lets say, authentication is enabled on your second service. Your first service (where your health check is) has to be configured properly to authenticate with the second service.

让我们说,在您的第一个服务的后续部署中,您搞砸了应该从configmap或secret中读取的authheader变量名.您正在进行滚动更新.

Let's just say, in a subsequent deployment of your 1st service, you screwed up authheader variable name which you were supposed to read from the configmap or secret. And you are doing a rolling update.

如果(第一项服务的)运行状况检查中还包含第二项服务的http200,则将阻止部署的详细版本;您的旧版本将继续运行,因为您的新版本将永远无法通过运行状况检查.我们可能甚至不需要进行复杂的身份验证,而仅是说第二个服务的URL在第一个服务中是硬编码的,而您在随后的第一个服务版本中搞砸了该URL.您进行的健康检查中的这项额外检查将阻止有问题的版本上线

If you have the second service's http200 also included in the health check (of the 1st service) then that will prevent the screwed-up version of the deployment from going live; your old version will keep running because your newer version will never make it across the health-check. We may not even need to go that complicated to authentication and all, let's just say url of the second service is hard coded in the first service, and you screwed up that url in a subsequent release of your first service. This additional check in your health-check will prevent the buggy version from going live

2)另一方面,假设您的第一个服务具有许多其他功能,并且第二个服务中断了几个小时不会影响第一个服务提供的任何重要功能.然后,您可以从第一服务的运行状况检查中选择退出第二服务的状态.

2) On the other hand, Let's assume that your first service has numerous other functionalities and this second service being down for a few hours will not affect any significant functionality that first service offers. Then, by all means you can opt out of the second service's liveness from first service's health check.

无论哪种方式,您都需要为这两种服务设置适当的警报和监视.这将有助于决定何时进行干预.

Either way, you need to set up proper alerting and monitoring for both the services. This will help to decide when humans should intervene.

我要做的是(忽略其他不相关的细节),

What I would do is (ignore other irrelevant details),

readinessProbe:
  httpGet:
    path: </Actuator-healthcheck-endpoint>
    port: 8080
  initialDelaySeconds: 120
  timeoutSeconds: 5
livenessProbe:
  httpGet:
    path: </my-custom-endpoint-which-always-returns200>
    port: 8080
  initialDelaySeconds: 130
  timeoutSeconds: 10
  failureThreshold: 10

这篇关于Kubernetes-活力和就绪性探针实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆