Kubernetes没有在其他节点上调度失败的Pod [英] Kubernetes not scheduling failed pod on other node

查看:121
本文介绍了Kubernetes没有在其他节点上调度失败的Pod的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4个节点kubernetes集群.我的应用程序运行2个副本实例.我正在使用具有副本集的部署资源.根据文档,副本集始终确保指定为no. of应用程序实例将在运行,如果我删除了一个Pod实例,那么它将在相同或不同的节点上重新启动,但是当我通过在一个节点上停止docker引擎来模拟Pod实例的故障时. Kubectl将状态显示为Pod实例的错误,但不要在另一个节点上重新启动Pod.是预期的行为还是我错过了一些东西.

I have 4 nodes kubernetes cluster. My application running with 2 replica instances. I am using deployment resource with replica set. As per the documentation , replica set always ensure that specified no. Of application instances will be running.If I delete the one pod instance, then it will be restarted on the same or different node.But when I simulated the failure of a pod instance by stopping docker engine on one node. Kubectl shows status as error for the pod instance but do not restart the pod on another node. Is it the expected behaviour or am I missing something.

推荐答案

AFAIS Kubernetes在1.5版中更改了该行为.如果我解释文档正确地,失败节点的Pod仍在apiserver中注册,因为它突然死亡并且无法注销Pod.由于Pod仍处于注册状态,因此ReplicaSet不会替换它.

AFAIS Kubernetes changed that behavior with version 1.5. If I interpret the docs correctly, the Pods of the failed node is still registered in the apiserver, since it abruptly died and wasn't able to unregister the pods. Since the Pod is still registered, the ReplicaSet doesn't replace it.

其原因是,Kubernetes无法确定这是网络错误(例如裂脑)还是节点故障.在引入StatefulSets之后,Kubernetes需要确保没有启动Pod超过一次.

The reason for this is, that Kubernetes cannot tell if it is a network error (eg split-brain) or a node failure. With StatefulSets being introduced, Kubernetes needs to make sure that no Pod is started more than one time.

这听起来像是个错误,但是如果您具有正确配置的云提供程序(例如,用于GCE或AWS),Kubernetes可以查看该Node是否仍在运行.当您关闭该节点时,控制器应注销该节点及其Pod的注册,然后在另一个Node上创建一个新的Pod.结合Node运行状况检查和Node更换,群集可以自我修复.

This maybe sounds like a bug, but if you have a properly configured cloud-provider (eg for GCE or AWS), Kubernetes can see if that Node is still running. When you would shut down that node, the controller should unregister the Node and its Pods and then create a new Pod on another Node. Together with a Node health check and a Node replacement, the cluster is able to heal itself.

如何配置云提供商很大程度上取决于您的Kubernetes设置.

How the cloud-provider is configured depends highly on your Kubernetes setup.

这篇关于Kubernetes没有在其他节点上调度失败的Pod的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆