"Kubelet停止发布节点状态"并且无法访问节点 [英] 'Kubelet stopped posting node status' and node inaccessible

查看:279
本文介绍了"Kubelet停止发布节点状态"并且无法访问节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个相当新的群集中遇到了一些问题,其中几个节点(总是成对出现,但可能只是一个巧合)将变为NotReady,并且kubectl describe表示Kubelet停止发布内存的节点状态,磁盘,PID并准备就绪.

I am having some issues with a fairly new cluster where a couple of nodes (always seems to happen in pairs but potentially just a coincidence) will become NotReady and a kubectl describe will say that the Kubelet stopped posting node status for memory, disk, PID and ready.

所有正在运行的Pod都卡在Termination中(可以使用k9s连接到集群并看到此消息),而我发现的唯一解决方案是封锁并耗尽节点.几个小时后,似乎已删除它们并创建了新的.或者,我可以使用kubectl删除它们.

All of the running pods are stuck in Terminating (can use k9s to connect to the cluster and see this) and the only solution I have found is to cordon and drain the nodes. After a few hours they seem to be being deleted and new ones created. Alternatively I can delete them using kubectl.

通过ssh(超时)完全无法访问它们,但是AWS报告EC2实例没有问题.

They are completely inaccessible via ssh (timeout) but AWS reports the EC2 instances as having no issues.

在过去的一周里,这已经发生了三遍.一切都可以正常恢复,但是显然存在一些问题,我想深入探究.

This has now happened three times in the past week. Everything does recover fine but there is clearly some issue and I would like to get to the bottom of it.

如果我根本无法上箱,我该如何去发现发生了什么? (实际上我只是想为卷拍摄快照并挂载它,因此如果再次发生,将尝试这样做,但是欢迎其他建议)

How would I go about finding out what has gone on if I cannot get onto the boxes at all? (Actually just occurred to me to maybe take a snapshot of the volume and mount it so will try that if it happens again, but any other suggestions welcome)

运行kubernetes v1.18.8

Running kubernetes v1.18.8

推荐答案

答案是iops的问题,因为du命令来自-我想-cadvisor.从那时起,我已移至io1盒并具有稳定性,因此将其标记为已关闭,并将ec2实例类型的移动标记为分辨率

The answer turned out to be an issue with iops as a result of du commands coming from - I think - cadvisor. I have moved to io1 boxes and have had stability since then so going to mark this as closed and the move of ec2 instance types as the resolution

感谢您的帮助!

这篇关于"Kubelet停止发布节点状态"并且无法访问节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆