Kubernetes有大量处于错误状态的Pod,似乎无法清除 [英] Kubernetes has a ton of pods in error state that can't seem to be cleared

查看:129
本文介绍了Kubernetes有大量处于错误状态的Pod,似乎无法清除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最初试图运行一个似乎卡在CrashBackoffLoop中的作业.这是服务文件:

I was originally trying to run a Job that seemed to get stuck in a CrashBackoffLoop. Here was the service file:

apiVersion: batch/v1
kind: Job
metadata:
  name: es-setup-indexes
  namespace: elk-test
spec:
  template:
    metadata:
      name: es-setup-indexes
    spec:
      containers:
      - name: es-setup-indexes
        image: appropriate/curl
        command: ['curl -H  "Content-Type: application/json" -XPUT http://elasticsearch.elk-test.svc.cluster.local:9200/_template/filebeat -d@/etc/filebeat/filebeat.template.json']
        volumeMounts:
        - name: configmap-volume
          mountPath: /etc/filebeat/filebeat.template.json
          subPath: filebeat.template.json
      restartPolicy: Never

      volumes:
        - name: configmap-volume
          configMap:
            name: elasticsearch-configmap-indexes

我尝试删除该作业,但只有在运行以下命令后,它才能正常工作:

I tried deleting the job but it would only work if I ran the following command:

kubectl delete job es-setup-indexes --cascade=false

此后,我在运行时注意到:

After that I noticed when running:

kubectl get pods -w

我会得到一堆处于错误状态的豆荚,但我看不出有办法清理它们.这是我运行get pods时输出的一小部分示例:

I would get a TON of pods in an Error state and I see no way to clean them up. Here is just a small sample of the output when I run get pods:

es-setup-indexes-zvx9c   0/1       Error     0         20h
es-setup-indexes-zw23w   0/1       Error     0         15h
es-setup-indexes-zw57h   0/1       Error     0         21h
es-setup-indexes-zw6l9   0/1       Error     0         16h
es-setup-indexes-zw7fc   0/1       Error     0         22h
es-setup-indexes-zw9bw   0/1       Error     0         12h
es-setup-indexes-zw9ck   0/1       Error     0         1d
es-setup-indexes-zwf54   0/1       Error     0         18h
es-setup-indexes-zwlmg   0/1       Error     0         16h
es-setup-indexes-zwmsm   0/1       Error     0         21h
es-setup-indexes-zwp37   0/1       Error     0         22h
es-setup-indexes-zwzln   0/1       Error     0         22h
es-setup-indexes-zx4g3   0/1       Error     0         11h
es-setup-indexes-zx4hd   0/1       Error     0         21h
es-setup-indexes-zx512   0/1       Error     0         1d
es-setup-indexes-zx638   0/1       Error     0         17h
es-setup-indexes-zx64c   0/1       Error     0         21h
es-setup-indexes-zxczt   0/1       Error     0         15h
es-setup-indexes-zxdzf   0/1       Error     0         14h
es-setup-indexes-zxf56   0/1       Error     0         1d
es-setup-indexes-zxf9r   0/1       Error     0         16h
es-setup-indexes-zxg0m   0/1       Error     0         14h
es-setup-indexes-zxg71   0/1       Error     0         1d
es-setup-indexes-zxgwz   0/1       Error     0         19h
es-setup-indexes-zxkpm   0/1       Error     0         23h
es-setup-indexes-zxkvb   0/1       Error     0         15h
es-setup-indexes-zxpgg   0/1       Error     0         20h
es-setup-indexes-zxqh3   0/1       Error     0         1d
es-setup-indexes-zxr7f   0/1       Error     0         22h
es-setup-indexes-zxxbs   0/1       Error     0         13h
es-setup-indexes-zz7xr   0/1       Error     0         12h
es-setup-indexes-zzbjq   0/1       Error     0         13h
es-setup-indexes-zzc0z   0/1       Error     0         16h
es-setup-indexes-zzdb6   0/1       Error     0         1d
es-setup-indexes-zzjh2   0/1       Error     0         21h
es-setup-indexes-zzm77   0/1       Error     0         1d
es-setup-indexes-zzqt5   0/1       Error     0         12h
es-setup-indexes-zzr79   0/1       Error     0         16h
es-setup-indexes-zzsfx   0/1       Error     0         1d
es-setup-indexes-zzx1r   0/1       Error     0         21h
es-setup-indexes-zzx6j   0/1       Error     0         1d
kibana-kq51v   1/1       Running   0         10h

但是,如果我看一下工作,我什么都没有了:

But if I look at the jobs I get nothing related to that anymore:

$ kubectl get jobs --all-namespaces                                                                              
NAMESPACE     NAME               DESIRED   SUCCESSFUL   AGE
kube-system   configure-calico   1         1            46d

我还注意到kubectl的响应似乎很慢.我不知道Pod是持续尝试重新启动还是处于故障状态,但是如果有人让我知道如何进行故障排除,这是很好的,因为我在kubernetes中还没有遇到过类似的问题.

I've also noticed that kubectl seems much slow to respond. I don't know if the pods are continuously trying to be restarted or in some broken state but would be great if someone could let me know how to troubleshoot as I have not come across another issue like this in kubernetes.

Kube信息:

$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

推荐答案

在这里您可以快速修复它:)

Here you are a quick way to fix it :)

kubectl get pods | grep Error | cut -d' ' -f 1 | xargs kubectl delete pod

如果使用的是旧版本的k8s,请添加标记-a

Add flag -a if you are using an old version of k8s

这篇关于Kubernetes有大量处于错误状态的Pod,似乎无法清除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆