如果我重新启动运行了一些Pod的节点会发生什么 [英] what would happen if i restart a node with some pods running

查看:83
本文介绍了如果我重新启动运行了一些Pod的节点会发生什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设Kubernetes节点上运行着Deployments/StatefulSet/DaemonSet等中的一些Pod.

Assume that there are some pods from Deployments/StatefulSet/DaemonSet, etc. running on a Kubernetes node.

然后我直接重新启动节点,然后启动docker,使用相同的参数启动kubelet.

Then I restarted the node directly, and then start docker, start kubelet with the same parameters.

那些豆荚会发生什么?

  1. 是否使用从kubelet本地保存的元数据重新创建了它们?还是使用从api服务器检索到的信息?还是从OCI运行时恢复并表现为什么都没有发生?
  2. 是否只能正常恢复无状态的pod(没有--local-data)?如果其中任何一个具有本地PV/dir,是否可以正常连接回去?
  3. 如果我很长时间没有重新启动节点,该怎么办?api服务器会分配其他节点来创建那些容器吗?默认超时值是多少?我该如何配置?

据我所知:

 apiserver
    ^
    |(sync)
    V
  kubelet
    ^
    |(sync)
    V
-------------
| CRI plugin |(like api)
| containerd |(like api-server)
|    runc    |(low-level binary which manages container)
| c' runtime |(container runtime where containers run)
-------------

当kubelet从kube-api-server接收到PodSpec时,它像远程服务一样调用CRI,步骤如下:

When kubelet received a PodSpec from kube-api-server, it calls CRI like a remote service, the steps be like:

  1. 创建PodSandbox(又称暂停"图像,始终停止")
  2. 创建容器
  3. 运行容器

因此,我猜测,随着节点和泊坞窗的重新启动,步骤1和2已经完成,容器处于已停止"状态;然后,在重新启动kubelet时,它将从kube-api-server中获取最新信息,发现容器未处于运行"状态,因此它调用CRI运行容器,然后一切恢复正常.

So I guess that as the node and docker being restarted, steps 1 and 2 are already done, containers are at 'stopped' status; Then as kubelet being restarted, it pulls latest info from kube-api-server, found out that container(s) are not in 'running' state, so it calls CRI to run container(s), then everything are back to normal.

请帮助我确认.

谢谢你〜

推荐答案

好的问题.首先是几件事;Pod未固定到某个节点.节点通常被视为服务器的服务器场".Kubernetes可以用来运行其工作负载.例如.您给Kubernetes设置了一组节点,还给了一组例如 Deployment (部署)-这是应在您的服务器上运行的应用程序的期望状态.Kubernetes负责调度这些Pod,并在集群中的某些内容发生更改时保持它们运行.

Good questions. A few things first; a Pod is not pinned to a certain node. The nodes is mostly seen as a "server farm" that Kubernetes can use to run its workload. E.g. you give Kubernetes a set of nodes and you also give a set of e.g. Deployment - that is desired state of applications that should run on your servers. Kubernetes is responsible for scheduling these Pods and also keep them running when something in the cluster is changed.

独立Pod不受任何管理,因此,如果Pod崩溃,则无法恢复.您通常希望将无状态应用程序部署为 Deployments ,然后启动管理一组Pod的 ReplicaSets .4个Pod-您的应用实例.

Standalone pods is not managed by anything, so if a Pod crashes it is not recovered. You typically want to deploy your stateless apps as Deployments, that then initiates ReplicaSets that manage a set of Pods - e.g. 4 Pods - instances of your app.

您想要的状态; Deployment (部署),例如副本:4 保存在Kubernetes控制平面内的 etcd 数据库中.

Your desired state; a Deployment with e.g. replicas: 4 is saved in the etcd database within the Kubernetes control plane.

然后,一组用于 Deployment ReplicaSet 的控制器负责使您的应用程序的4个副本保持活动状态.例如.如果某个节点变得不负责任(或死亡),则新的Pod将在其他节点上创建,如果它们由 ReplicaSet 的控制器进行管理.

Then a set of controllers for Deployment and ReplicaSet is responsible for keeping 4 replicas of your app alive. E.g. if a node becomes unresponsible (or dies), new pods will be created on other Nodes, if they are managed by the controllers for ReplicaSet.

Kubelet 会收到预定的PodSpec到该节点,然后通过定期运行状况检查使这些Pod保持活动状态.

A Kubelet receives a PodSpecs that are scheduled to the node, and then keep these pods alive by regularly health checks.

是否只能正常恢复无状态的pod(没有--local-data)?

Is it that only stateless pod(no --local-data) can be recovered normally?

应将Pod视为临时的-例如可以消失-但由管理它们的控制器恢复-除非部署为独立的Pod.因此,请勿在本地容器中存储本地数据.

Pods should be seen as emphemeral - e.g. can disappear - but is recovered by a controller that manages them - unless deployed as standalone Pod. So don't store local data within the pod.

还有 StatefulSet Pod,它们用于有状态工作负载-但是分布式有状态工作负载,例如3个Pod,它们使用 Raft 复制数据.etcd数据库是使用Raft的分布式数据库的示例.

There is also StatefulSet pods, those are meant for stateful workload - but distributed stateful workload, typically e.g. 3 pods, that use Raft to replicate data. The etcd database is an example of distributed database that uses Raft.

这篇关于如果我重新启动运行了一些Pod的节点会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆