如何在Kubernetes中模拟电源故障 [英] How to simulate Power Failure In Kubernetes

查看:121
本文介绍了如何在Kubernetes中模拟电源故障的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的rook-ceph集群在AWS上运行.它加载了数据. 有什么方法可以激发电源故障,以便我可以测试群集的行为?

I have my rook-cephcluster running on AWS. Its loaded up with data. Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.

推荐答案

集群容器直到某人(一个人或一个控制器)销毁它们或出现不可避免的硬件或系统软件错误后才会消失.

Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.

开发人员将这些不可避免的情况称为对应用程序的非自愿中断.例如:

Developers call these unavoidable cases involuntary disruptions to an application. Examples are:

  • 支持该节点的物理机的硬件故障
  • 集群管理员误删除VM(实例)
  • 云提供商或虚拟机管理程序故障使虚拟机消失 内核恐慌
  • 该节点由于群集网络分区而从群集中消失
  • 由于节点资源不足而将Pod逐出. 除了资源不足的情况外,大多数用户都应该熟悉所有这些条件.它们不是特定于Kubernetes的.
  • a hardware failure of the physical machine backing the node
  • cluster administrator deletes VM (instance) by mistake
  • cloud provider or hypervisor failure makes VM disappear a kernel panic
  • the node disappears from the cluster due to cluster network partition
  • eviction of a pod due to the node being out-of-resources. Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.

开发人员将其他情况称为自愿中断.这些动作既包括应用程序所有者发起的动作,也包括集群管理员发起的动作.

Developers call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator.

典型的应用程序所有者操作包括:

Typical application owner actions include:

  • 删除管理吊舱的部署或其他控制器
  • 更新部署的Pod模板,导致重新启动
  • 直接删除某个广告连播(例如,无意中)

您可以在这里找到更多信息: kubernetes-discruption 应用程序中断.

More information you can find here: kubernetes-discruption, application-discruption.

您可以在群集上设置Prometheus,并在出现故障时确保指标.

You can setup Prometheus on your cluster and mesure metrics during failure.

这篇关于如何在Kubernetes中模拟电源故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆