为什么Kubernetes Pod进入Terminate状态并给出Completed原因并退出代码0? [英] Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

查看:1247
本文介绍了为什么Kubernetes Pod进入Terminate状态并给出Completed原因并退出代码0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力在Kubernetes文档中找到对此的任何答案.情况如下:

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:

  • 基于AWS的Kubernetes 1.4版
  • 8个运行NodeJS API(Express)的pod部署为Kubernetes部署
  • 其中一个Pod在深夜没有明显原因重启(无流量,无CPU高峰,无内存压力,无警报...).因此,重新启动的次数增加了.
  • 日志没有显示任何异常(运行kubectl -p可以查看以前的日志,那里根本没有错误)
  • 资源消耗正常,看不到有关Kubernetes将Pod重新安排到另一个节点或类似节点的任何事件
  • 描述该容器会返回TERMINATED状态,并给出COMPLETED原因并退出代码0.由于该容器已被多次更换,因此我没有kubectl的确切输出.
  • Kubernetes version 1.4 over AWS
  • 8 pods running a NodeJS API (Express) deployed as a Kubernetes Deployment
  • One of the pods gets restarted for no apparent reason late at night (no traffic, no CPU spikes, no memory pressure, no alerts...). Number of restarts is increased as a result of this.
  • Logs don't show anything abnormal (ran kubectl -p to see previous logs, no errors at all in there)
  • Resource consumption is normal, cannot see any events about Kubernetes rescheduling the pod into another node or similar
  • Describing the pod gives back TERMINATED state, giving back COMPLETED reason and exit code 0. I don't have the exact output from kubectl as this pod has been replaced multiple times now.

pod是NodeJS服务器实例,它们无法完成 ,它们始终在运行以等待请求.

The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.

这是否是Pod的内部Kubernetes重新布置?有什么办法知道什么时候发生的吗?不应该在某个地方说出发生原因的事件吗?

Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?

更新

这只是在我们的产品环境中发生的.描述有问题的Pod的结果是:

This just happened in our prod environment. The result of describing the offending pod is:

api: Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31 Image: 1.0.5-RC1 Image ID: docker://sha256:XXXX Ports: 9080/TCP, 9443/TCP State: Running Started: Mon, 27 Mar 2017 12:30:05 +0100 Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 24 Mar 2017 13:32:14 +0000 Finished: Mon, 27 Mar 2017 12:29:58 +0100 Ready: True Restart Count: 1

api: Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31 Image: 1.0.5-RC1 Image ID: docker://sha256:XXXX Ports: 9080/TCP, 9443/TCP State: Running Started: Mon, 27 Mar 2017 12:30:05 +0100 Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 24 Mar 2017 13:32:14 +0000 Finished: Mon, 27 Mar 2017 12:29:58 +0100 Ready: True Restart Count: 1

更新2

这是使用的deployment.yaml文件:

apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
  namespace: "${ENV}"
  name: "${APP}${CANARY}"
  labels:
    component: "${APP}${CANARY}"
spec:
  replicas: ${PODS}
  minReadySeconds: 30
  revisionHistoryLimit: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        component: "${APP}${CANARY}"
    spec:
      serviceAccount: "${APP}"

${IMAGE_PULL_SECRETS}

      containers:
      - name: "${APP}${CANARY}"
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
        env:
        - name: "KUBERNETES_CA_CERTIFICATE_FILE"
          value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: "metadata.namespace"
        - name: "ENV"
          value: "${ENV}"
        - name: "PORT"
          value: "${INTERNAL_PORT}"
        - name: "CACHE_POLICY"
          value: "all"
        - name: "SERVICE_ORIGIN"
          value: "${SERVICE_ORIGIN}"
        - name: "DEBUG"
          value: "http,controllers:recommend"
        - name: "APPDYNAMICS"
          value: "true"
        - name: "VERSION"
          value: "${TAG}"
        ports:
        - name: "http"
          containerPort: ${HTTP_INTERNAL_PORT}
          protocol: "TCP"
        - name: "https"
          containerPort: ${HTTPS_INTERNAL_PORT}
          protocol: "TCP"

上述部署"清单中引用的映像的Dockerfile:

The Dockerfile of the image referenced in the above Deployment manifest:

FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH      $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app

RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app

# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
    && /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'

ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH      $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH

# Create app directory
WORKDIR /home/app
COPY . /home/app

# Install app dependencies
RUN npm install

EXPOSE 9080 9443
CMD [ "npm", "start" ]

npm start是常规node app.js命令的别名,该命令在端口9080上启动NodeJS服务器.

npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.

推荐答案

检查您运行的docker版本,以及在此期间docker守护进程是否已重新启动.

Check the version of docker you run, and whether the docker daemon was restarted during that time.

如果重新启动docker守护程序,则所有容器将被终止(除非您使用1.12中的新实时还原"功能).在某些泊坞窗版本中,泊坞窗可能会错误地报告在这种情况下终止的所有容器的退出代码0".有关更多信息,请参见 https://github.com/docker/docker/issues/31262 细节.

If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.

这篇关于为什么Kubernetes Pod进入Terminate状态并给出Completed原因并退出代码0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆