Kubernetes Nginx:如何进行零停机时间部署? [英] Kubernetes Nginx: How to have zero-downtime deployments?

查看:231
本文介绍了Kubernetes Nginx:如何进行零停机时间部署?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以零停机时间进行kubernetes nginx部署.该过程的一部分是启动rollingUpdate,以确保至少一个pod一直在运行nginx.效果很好.

I am attempting to have a kubernetes nginx deployment with zero downtime. Part of that process has been to initiate a rollingUpdate, which ensures that at least one pod is running nginx at all times. This works perfectly well.

当旧的Nginx Pod终止时,我遇到了错误. 根据终止上的kubernetes文档,kubernetes将:

I am running into errors when the old nginx pod is terminating. According to the kubernetes docs on termination, kubernetes will:

  1. 从服务的端点列表中删除pod,因此它是 终止开始时未收到任何新流量
  2. 调用停止前挂钩(如果已定义),然后等待其完成
  3. 将SIGTERM发送给所有其余进程
  4. 宽限期到期后,将SIGKILL发送给其余所有进程.
  1. remove the pod from the endpoints list for the service, so it is not receiving any new traffic when termination begins
  2. invoke a pre-stop hook if it is defined, and wait for it to complete
  3. send SIGTERM to all remaining processes
  4. send SIGKILL to any remaining processes after the grace period expires.

我知道命令nginx -s quit应该通过在主服务器终止之前等待所有工作者完成请求来正常终止nginx.它会优雅地响应SIGQUIT命令,而SIGTERM会导致暴力终止.其他论坛说,这就像在您的部署中添加以下preStop钩子一样简单:

I understand that the command nginx -s quit is supposed to gracefully terminate nginx by waiting for all workers to complete requests before the master terminates. It responds gracefully to the SIGQUIT command, while SIGTERM results in violent termination. Other forums say that it is as easy as adding the following preStop hook to your deployment:

lifecycle:
  preStop:
    exec:
      command: ["/usr/sbin/nginx", "-s", "quit"]

但是,通过测试此命令,我发现nginx -s quit立即返回,而不是等待工作人员完成.它也不会返回主进程的PID,这正是我希望得到的D:

However, from testing this command I have found that nginx -s quit returns immediately, instead of waiting for the workers to complete. It also does not return the PID of the master process, which is what I was hoping for D:

发生的事情是,kubernetes调用了nginx -s quit,这将向工人的孩子发送适当的SIGQUIT,但不会等待他们完成.取而代之的是,它将直接跳到第3步,并SIGTERM这些进程,从而导致暴力终止,并因此失去连接.

What happens is, kubernetes invokes nginx -s quit, which will send a proper SIGQUIT to the worker children, but not wait for them to complete. Instead it will jump right to step 3 and SIGTERM those processes instead, resulting in violent termination, and thus, lost connections.

问题:有没有人找到一种很好的方法来在滚动部署期间正常关闭其nginx控制器并使停机时间为零? sleep解决方法还不够好,我正在寻找更可靠的方法.

QUESTION: Has anyone figured out a good way to gracefully shut down their nginx controller during a rolling deployment and have zero downtime? A sleep workaround isn't good enough, I'm looking for something more robust.

下面是完整的部署Yaml:

Below is the full deployment yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
 template:
    metadata:
      labels:
        app: nginx-ingress-lb
    spec:
      terminationGracePeriodSeconds: 60
      serviceAccount: nginx
      containers:
        - name: nginx-ingress-controller
          image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.8
          imagePullPolicy: Always
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 5
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
            - --v=2
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - containerPort: 80
          lifecycle:
            preStop:
              exec:
                command: ["/usr/sbin/nginx", "-s", "quit"]

推荐答案

我讨厌回答我自己的问题,但是在点点头之后,这就是我到目前为止的内容.

I hate answering my own questions, but after noodling a bit this is what i have so far.

我创建了一个半阻塞的bash脚本,称为killer:

I created a bash script that is semi-blocking, called killer:

#!/bin/bash

sleep 3
PID=$(cat /run/nginx.pid)
nginx -s quit

while [ -d /proc/$PID ]; do
  sleep 0.1
done

我发现在nginx容器内有一个文件/run/nginx.pid,该文件具有主进程的PID.如果调用nginx -s quit并等待直到该过程消失,则实际上是使退出命令阻塞".

I found that inside the nginx pod there is a file /run/nginx.pid which has the PID of the master process. If you call nginx -s quit and initiate a wait until the process disappears, you have essentially made the quit command "blocking".

请注意,在发生任何事情之前先有一个sleep 3.这是由于竞争条件导致Kubernetes将Pod标记为终止,但需要一点时间(< 1s)才能从将流量指向该Pod的服务中删除该Pod.

Note that there is a sleep 3 before anything happens. This is due to a race condition where Kubernetes marks a pod as terminating, but takes a little time (< 1s) to remove this pod from the service that points traffic toward it.

我已经将此脚本安装到了pod中,并通过preStop指令对其进行了调用.它通常可以正常工作,但是在测试过程中,仍然偶尔会出现blip错误,提示连接被对等方重置".但这是朝正确方向迈出的一步.

I have mounted this script into my pod, and called it via the preStop directive. It mostly works, but during testing there are still occasional blips where i get a curl error that the connection was "reset by peer." But this is a step in the right direction.

这篇关于Kubernetes Nginx:如何进行零停机时间部署?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆