关于AKS集群中已删除或发生故障的虚拟机/节点,AKS kubernetes上的自我修复如何工作? [英] How does self-healing work on AKS kubernetes with respect to deleted or failed virtual machines/nodes in the AKS cluster?

查看:194
本文介绍了关于AKS集群中已删除或发生故障的虚拟机/节点,AKS kubernetes上的自我修复如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨.我正在探索AKS,尤其是服务的自我修复方面.如果要手动删除属于我的AKS群集资源的虚拟机实例(扩展到两个虚拟机),它会自动恢复到群集的总群集容量设置吗? 如果是这样,这需要多长时间?同样,如果运行虚拟机的主机发生故障,Azure是否会自我修复并重新注册k8s节点?最后,是否可以模拟硬件故障,以便可以在游戏日手册中对其进行测试?

解决方案

JoshKe,

在没有自动缩放器的集群中,

如果我们手动删除一个节点,则该节点kubelet不会将ping发送给kubernetes主节点.然后,这些节点上的容器将重新安排在其他运行状况良好的节点上.

仅当发生其他规模事件时,集群才会创建替换项.也就是说,当我们尝试扩大规模或缩小规模时.

万一发生硬件故障(类似于虚拟机),节点将放置在其他虚拟机管理程序上,并且一个进入运行状态,它将在集群中注册.此过程类似于重新启动节点.其也类似于 修补并重新启动计算机.

如果使用部署了群集 集群自动缩放器(Preview)  ;,然后自动缩放器根据等待的Pod的数量自动管理节点数量的放大和缩小.检查等待的Pod的时间为10秒,可配置.


Hi. I'm exploring AKS, especially the self-healing aspect of the service. If I were to manually delete a virtual machine instance belonging to my AKS cluster resource (scales to two vms), will it self-heal back to the cluster's total cluster capacity setting? If so, how long does this take? Similarly, does Azure self-heal and re-register k8s nodes in the event of failure of the host running a virtual machine? And finally, can hardware failure be simulated so I can test it in a game day playbook?

解决方案

Hi JoshKe,

In a cluster without autoscaler,

if we delete a node manually, Then that nodes kubelet wont send pings to the kubernetes master. Then the containers on those nodes will be rescheduled on other healthy nodes. 

Cluster will create a replacement only when another scale event happens. ie when we try to scale up or scale down the cluster. 

In case of hardware failures (Its similar to virtual machines ), Nodes will be placed on other hypervisors and one it comes to the running state, It will register with the cluster.  This process is similar to rebooting the node. Its also similar to patching and rebooting the machine.

If the cluster is deployed with Cluster autoscaler (Preview)  , Then the autoscaler manages scaling up and down of number of nodes automatically based on the number of waiting pods. Time to check for waiting pods is 10 seconds and its configurable. 


这篇关于关于AKS集群中已删除或发生故障的虚拟机/节点,AKS kubernetes上的自我修复如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆