如何在Kubernetes中模拟电源故障 [英] How to simulate Power Failure In Kubernetes
问题描述
我的rook-ceph
集群在AWS
上运行.它加载了数据.
有什么方法可以激发电源故障,以便我可以测试群集的行为?
I have my rook-ceph
cluster running on AWS
. Its loaded up with data.
Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.
推荐答案
集群容器直到某人(一个人或一个控制器)销毁它们或出现不可避免的硬件或系统软件错误后才会消失.
Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
开发人员将这些不可避免的情况称为对应用程序的非自愿中断.例如:
Developers call these unavoidable cases involuntary disruptions to an application. Examples are:
- 支持该节点的物理机的硬件故障
- 集群管理员误删除VM(实例)
- 云提供商或虚拟机管理程序故障使虚拟机消失 内核恐慌
- 该节点由于群集网络分区而从群集中消失
- 由于节点资源不足而将Pod逐出. 除了资源不足的情况外,大多数用户都应该熟悉所有这些条件.它们不是特定于Kubernetes的.
- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM disappear a kernel panic
- the node disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being out-of-resources. Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
开发人员将其他情况称为自愿中断.这些动作既包括应用程序所有者发起的动作,也包括集群管理员发起的动作.
Developers call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator.
典型的应用程序所有者操作包括:
Typical application owner actions include:
- 删除管理吊舱的部署或其他控制器
- 更新部署的Pod模板,导致重新启动
- 直接删除某个广告连播(例如,无意中)
您可以在这里找到更多信息: kubernetes-discruption , 应用程序中断.
More information you can find here: kubernetes-discruption, application-discruption.
您可以在群集上设置Prometheus,并在出现故障时确保指标.
You can setup Prometheus on your cluster and mesure metrics during failure.
这篇关于如何在Kubernetes中模拟电源故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!