在Kubernetes推出期间测试HTTP服务器上的正常关机 [英] Testing graceful shutdown on an HTTP server during a Kubernetes rollout

查看:91
本文介绍了在Kubernetes推出期间测试HTTP服务器上的正常关机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遵循了一些有关如何设置HTTP服务器并在本地Kubernetes集群中对其进行测试的教程(使用 minikube )。

I followed some tutorials on how to set up an HTTP server, and test it in a local Kubernetes cluster (using minikube).

我还从发现的一些示例中实现了正常关机,并期望Kubernetes滚动重启不会造成停机。

I also implemented graceful shutdown from some examples I found, and expected that there would be no downtime from a Kubernetes rolling restart.

,我开始执行负载测试(使用 Apache Benchmark ,通过运行 ab -n 100000 -c 20< addr> )并在基准测试期间运行 kubectl卷展栏重新启动,但 ab 在执行滚动重启后立即停止运行。

To verify that, I started performing load tests (using Apache Benchmark, by running ab -n 100000 -c 20 <addr>) and running kubectl rollout restart during the benchmarking, but ab stops running as soon as the rolling restart is performed.

这是我当前的项目设置:

Here is my current project setup:

Dockerfile

FROM golang:1.13.4-alpine3.10

RUN mkdir /app
ADD . /app
WORKDIR /app

RUN go build -o main src/main.go
CMD ["/app/main"]

src / main.go

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"

    "github.com/gorilla/mux"
)

func main() {
    srv := &http.Server{
        Addr:    ":8080",
        Handler: NewHTTPServer(),
    }

    idleConnsClosed := make(chan struct{})
    go func() {
        sigint := make(chan os.Signal, 1)
        signal.Notify(sigint, os.Interrupt, syscall.SIGTERM, syscall.SIGINT)
        <-sigint

        // We received an interrupt signal, shut down.
        if err := srv.Shutdown(context.Background()); err != nil {
            // Error from closing listeners, or context timeout:
            log.Printf("HTTP server Shutdown: %v", err)
        }

        close(idleConnsClosed)
    }()

    log.Printf("Starting HTTP server")
    running = true
    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
        // Error starting or closing listener:
        log.Fatalf("HTTP server ListenAndServe: %v", err)
    }

    <-idleConnsClosed
}

func NewHTTPServer() http.Handler {
    r := mux.NewRouter()

    // Ping
    r.HandleFunc("/", handler)

    return r
}

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello World!")
}

kubernetes / deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: myapp
  name: myapp
spec:
  replicas: 10
  selector:
    matchLabels:
      app: myapp
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 5
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: dickster/graceful-shutdown-test:latest
        imagePullPolicy: Never
        ports:
        - containerPort: 8080

kubernetes / service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: myapp
  name: myapp
spec:
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: myapp
  sessionAffinity: None
  type: NodePort

此设置中是否缺少某些内容?根据 rollingUpdate 策略,应该至少有五个正在运行的Pod应该为传入的请求提供服务,但是 ab 退出 apr_socket_recv:连接被同级(54)重置错误。我还尝试添加就绪/活跃度探针,但是没有运气。我怀疑这里也不需要它们。

Is there something missing in this setup? According to the rollingUpdate strategy, there should be at least five running pods that should serve the incoming requests, but ab exits with an apr_socket_recv: Connection reset by peer (54) error. I also tried adding readiness/liveness probes, but no luck. I suspect they're not needed here, either.

推荐答案

要在不造成停机的情况下正常工作,您需要使豆荚停止运转接收新连接,同时允许Pod正常完成当前连接的处理。这意味着Pod需要运行,但还没有准备好,以便不向其发送新请求。

For this to work without downtime, you need to have the pods stop receiving new connections while the pod is allowed to gracefully finish handling current connections. This means the pod needs to be running, but not ready so that new requests are not sent to it.

您的服务将使用您配置的标签选择器来匹配所有Pod(我假设使用 app:myapp ),并将所有处于就绪状态的pod用作可能的后端。只要Pod通过了ReadinessProbe,它就会标记为就绪。由于未配置探针,因此pod状态只要运行即可默认为就绪。

Your service will match all pods using the label selector you configured (I assume app: myapp) and will use any pod in the ready state as a possible backend. The pod is marked as ready as long as it is passing the readinessProbe. Since you have no probe configured, the pod status will default to ready as long as it is running.

只需具有 readinessProbe 配置会极大地帮助您,但不会提供100%的正常运行时间,这将需要对代码进行一些调整,以使readinessProbe失败(因此不会发送新请求),而容器会以当前连接正常结束。

Just having a readinessProbe configured will help immensely, but will not provide 100% uptime, that will require some tweaks in your code to cause the readinessProbe to fail (so new requests are not sent) while the container gracefully finishes with current connections.

编辑:按照@Thomas Jungblut所述,消除Web服务器错误的很大一部分是应用程序如何处理SIGTERM。当Pod处于终止状态时,它将不再通过服务接收请求。在此阶段中,需要将您的Web服务器配置为正常完成和终止连接,而不是突然停止并终止请求。

As per @Thomas Jungblut mentioned, a big part of eliminating errors with your webserver is how the application handles SIGTERM. While the pod is in terminating state, it will no longer receive requests through the service. During this phase, your webserver needs to be configured to gracefully complete and terminate connections rather than stop abruptly and terminate requests.

请注意,这是在应用程序本身中配置的,而不是k8s设置。只要网络服务器正常地耗尽连接,并且您的Pod规范包含足以允许网络服务器耗尽的gracefulTerminationPeriod,您基本上就不会看到任何错误。尽管这仍然不能保证100%的正常运行时间,尤其是在使用ab轰炸服务时。

Note that this is configured in the application itself and is not a k8s setting. As long as the webserver gracefully drains the connections and your pod spec includes a gracefulTerminationPeriod long enough to allow the webserver to drain, you should see basically no errors. Although this still won't guarantee 100% uptime, especially when bombarding the service using ab.

这篇关于在Kubernetes推出期间测试HTTP服务器上的正常关机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆