设置就绪,活跃或启动探针 [英] Setting up a readiness, liveness or startup probe

查看:66
本文介绍了设置就绪,活跃或启动探针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解哪种方法最适合我的情况以及如何实际实施.

I'm having difficulty understanding which would be best for my situation and how to actually implement it.

简而言之,问题是这样的:

In a nutshell, the problem is this:

  • 我正在使用Skaffold扩展数据库(Postgres),BE(Django)和FE(React)部署
  • BE在DB旋转之前大约有50%的时间
  • Django要做的第一件事就是连接到数据库
  • 它只会尝试一次(根据设计并且无法更改),如果无法尝试,它将失败并导致应用程序损坏
  • 因此,我需要确保每次启动部署时,在开始进行BE部署之前,数据库部署都在运行中.
  • Thus, I need to make sure every single time I spin up my deployments, the DB deployment is running before starting the BE deployment

我遇到了就绪,活跃,以及starup探针.我已经阅读了好几次,准备调查听起来像我需要的:我不希望BE部署在DB部署准备好接受连接之前就开始.

I came across readiness, liveness, and starup probes. I've read it a couple times and readiness probes sound like what I need: I don't want the BE deployment to start until the DB deployment is ready to accept connections.

我想我不了解如何设置它.这是我尝试过的方法,但是我仍然遇到实例被加载到另一个实例之前的情况.

I guess I'm not understanding how to set it up. This is what I've tried, but I still run into instances where one is being loaded before another.

postgres.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      component: postgres
  template:
    metadata:
      labels:
        component: postgres
    spec:
      containers:
        - name: postgres
          image: testappcontainers.azurecr.io/postgres
          ports:
            - containerPort: 5432
          env: 
            - name: POSTGRES_DB
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGDATABASE
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGUSER
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGPASSWORD
            - name: POSTGRES_INITDB_ARGS
              value: "-A md5"
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
              subPath: postgres
      volumes:
        - name: postgres-storage
          persistentVolumeClaim:
            claimName: postgres-storage
---
apiVersion: v1
kind: Service
metadata:
  name: postgres-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: postgres
  ports:
    - port: 1423
      targetPort: 5432

api.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      component: api
  template:
    metadata:
      labels:
        component: api
    spec:
      containers:
        - name: api
          image: testappcontainers.azurecr.io/testapp-api
          ports:
            - containerPort: 5000
          env:
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGUSER
            - name: PGHOST
              value: postgres-cluster-ip-service
            - name: PGPORT
              value: "1423"
            - name: PGDATABASE
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGDATABASE
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: PGPASSWORD
            - name: SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: SECRET_KEY
            - name: DEBUG
              valueFrom:
                secretKeyRef:
                  name: testapp-secrets
                  key: DEBUG
          readinessProbe:
            httpGet:
              host: postgres-cluster-ip-service
              port: 1423
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 2
---
apiVersion: v1
kind: Service
metadata:
  name: api-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: api
  ports:
    - port: 5000
      targetPort: 5000

client.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: client-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      component: client
  template:
    metadata:
      labels:
        component: client
    spec:
      containers:
        - name: client
          image: testappcontainers.azurecr.io/testapp-client
          ports:
            - containerPort: 3000
          readinessProbe:
            httpGet:
              path: api-cluster-ip-service
              port: 5000
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 2
---
apiVersion: v1
kind: Service
metadata:
  name: client-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: client
  ports:
    - port: 3000
      targetPort: 3000

我认为 ingress.yaml skaffold.yaml 不会有帮助,但是请告诉我是否应该添加它们.

I don't think the ingress.yaml and the skaffold.yaml will be helpful, but let me know if I should add those.

那我在做什么错了?

因此,我根据David Maze的回复尝试了一些方法.这可以帮助我了解正在发生的事情,但是我仍然遇到一些我不太了解如何解决的问题.

So I've tried out a few things based on David Maze's response. This helped me understand what is going on better, but I am still running into issues I'm not quite understanding how to resolve.

第一个问题是,即使使用默认的 restartPolicy:Always ,即使Django失败,Pods本身也不会失败.Pod认为即使Django失败了,它们也完全健康.

The first problem is that even with a default restartPolicy: Always, and even though Django fails, the Pods themselves don't fail. The Pods think they are perfectly healthy even though Django has failed.

第二个问题是,显然需要使Pods了解Django的状态.那是我还没有全神贯注的部分,特别是探针应该检查其他部署或它们本身的状态吗?

The second problem is that apparently the Pods need to be made aware of Django's status. That is the part I'm not quite wrapping my brain around, particularly should probes be checking the status of other deployments or themselves?

昨天我的想法是前者,但今天我认为是后者:Pod需要知道其中包含的程序已失败.但是,我尝试过的所有操作只会导致探测失败,连接被拒绝等.

Yesterday my thinking was the former, but today I'm thinking it is the latter: the Pod needs to know the program contained in it has failed. However, everything I've tried just results in a failed probe, connection refused, etc.:

# referring to itself
host: /health
port: 5000

host: /healthz
port: 5000

host: /api
port: 5000

host: /
port: 5000

host: /api-cluster-ip-service
port: 5000

host: /api-deployment
port: 5000

# referring to the DB deployment
host: /health
port: 1423 #or 5432

host: /healthz
port: 1423 #or 5432

host: /api
port: 1423 #or 5432

host: /
port: 1423 #or 5432

host: /postgres-cluster-ip-service
port: 1423 #or 5432

host: /postgres-deployment
port: 1423 #or 5432

因此,尽管它是超级简单"的实现,但显然我在设置探针是错误的(正如一些博客所描述的那样).例如,/health /healthz 路由:这些是内置在Kubernetes中还是需要设置?重新阅读文档以希望澄清这一点.

So apparently I'm setting up the probe wrong, despite it being a "super-easy" implementation (as a few blogs have described it). For example, the /health and /healthz routes: are these built into Kubernetes or do these need to be setup? Rereading the docs to hopefully clarify this.

推荐答案

实际上,认为我可能已经解决了.

Actually, think I might have sorted it out.

部分问题是,即使 restartPolicy:Always 是默认设置,Pod也不知道Django失败了,因此认为它们是健康的.

Part of the problem is that even though restartPolicy: Always is the default, the Pods are not aware the Django has failed so it thinks they are healthy.

我的想法是错误的,因为我本来以为我需要参考数据库部署来查看它是否在开始API部署之前已经启动.相反,我需要检查Django是否失败,然后重新部署.

My thinking was wrong in that I originally assumed I needed to refer to the DB deployment to see if it had start before starting the API deployment. Instead I needed to check if Django had failed and redeploy it if it had.

通过以下操作为我完成了此任务:

Doing the following accomplished this for me:

livenessProbe:
  tcpSocket:
    port: 5000
  initialDelaySeconds: 2
  periodSeconds: 2
readinessProbe:
  tcpSocket:
    port: 5000
  initialDelaySeconds: 2
  periodSeconds: 2

我正在学习Kubernetes,因此,如果有更好的方法或者只是完全错误,请更正我.我只是知道它可以实现我想要的.

I'm learning Kubernetes so please correct me if there is a better way to do this or if this is just plain wrong. I just know it accomplishes what I want.

这篇关于设置就绪,活跃或启动探针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆