如果节点变为离线超时,Kubernetes将重新创建Pod [英] Kubernetes recreate pod if node becomes offline timeout
问题描述
我已经开始使用docker镜像并设置Kubernetes.我已经解决了所有问题,但豆荚娱乐的超时时间有问题.
I've started working with the docker images and set up Kubernetes. I have fixed everything but I am having problems with the timeout of pod recreations.
如果一个Pod在一个特定节点上运行,并且我将其关闭,则大约需要5分钟才能在另一个在线节点上重新创建Pod.
If one pod is running on one particular node and if I shut it down, it will take ~5 minutes to recreate the pod on another online node.
我已经检查了所有可能的配置文件,还设置了所有pod pod-eviction-timeout,horizontal-pod-autoscaler-downscale,horizontal-pod-autoscaler-downscale-delay标志,但是它仍然无法正常工作.
I've checked all the possible config files, also set all pod-eviction-timeout, horizontal-pod-autoscaler-downscale, horizontal-pod-autoscaler-downscale-delay flags but it is still not working.
当前的kube控制器管理器配置:
Current kube controller manager config:
spec:
containers:
- command:
- kube-controller-manager
- --address=192.168.5.135
- --allocate-node-cidrs=false
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-cidr=192.168.5.0/24
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
- --node-cidr-mask-size=24
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --use-service-account-credentials=true
- --horizontal-pod-autoscaler-downscale-delay=20s
- --horizontal-pod-autoscaler-sync-period=20s
- --node-monitor-grace-period=40s
- --node-monitor-period=5s
- --pod-eviction-timeout=20s
- --use-service-account-credentials=true
- --horizontal-pod-autoscaler-downscale-stabilization=20s
image: k8s.gcr.io/kube-controller-manager:v1.13.0
谢谢.
推荐答案
如果基于着色的驱逐,控制器管理器将无法逐出容忍异味的容器.即使您未在配置中定义驱逐策略,该策略也将成为默认策略,因为默认容忍秒数接纳控制器插件默认处于启用状态.
If Taint Based Evictions are present in the pod definition, controller manager will not be able to evict the pod that tolerates the taint. Even if you don't define an eviction policy in your configuration, it gets a default one since Default Toleration Seconds admission controller plugin is enabled by default.
默认容忍秒接纳控制器插件将您的吊舱配置如下:
Default Toleration Seconds admission controller plugin configures your pod like below:
tolerations:
- key: node.kubernetes.io/not-ready
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
您可以通过检查广告连播的定义来验证这一点:
You can verify this by inspecting definition of your pod:
kubectl get pods -o yaml -n <namespace> <pod-name>`
根据上述容忍度,在另一个准备就绪的节点上重新创建容器需要5分钟以上的时间,因为容器可以容忍not-ready
污点长达5分钟.在这种情况下,即使将--pod-eviction-timeout
设置为20s,由于公差,控制器管理器也无能为力.
According to above toleration it takes more than 5 minutes to recreate the pod on another ready node since pod can tolerate not-ready
taint for up to 5 minutes. In this case, even if you set --pod-eviction-timeout
to 20s, there is nothing controller manager can do because of the tolerations.
但是为什么要花5分钟以上呢?因为该节点在--node-monitor-grace-period
之后将被视为已关闭,默认为40s.此后,豆荚容忍计时器启动.
But why it takes more than 5 minutes? Because the node will be considered as down after --node-monitor-grace-period
which defaults to 40s. After that, pod toleration timer starts.
如果希望群集对节点中断做出更快的反应,则应使用污点和容差,而无需修改选项.例如,您可以按以下方式定义您的广告连播:
If you want your cluster to react faster for node outages, you should use taints and tolerations without modifying options. For example, you can define your pod like below:
tolerations:
- key: node.kubernetes.io/not-ready
effect: NoExecute
tolerationSeconds: 0
- key: node.kubernetes.io/unreachable
effect: NoExecute
tolerationSeconds: 0
在上述公差范围内,将在当前节点标记为未就绪之后,在就绪节点上重新创建Pod.由于--node-monitor-grace-period
的默认值为40秒,因此此过程应少于一分钟.
With above toleration your pod will be recreated on a ready node just after the current node marked as not ready. This should take less then a minute since --node-monitor-grace-period
is default to 40s.
如果您想在下面控制这些时间,则可以找到很多选择.但是,应避免修改这些选项.如果使用的时间太紧,可能会在etcd上造成额外的开销,因为每个节点都会非常频繁地尝试更新其状态.
If you want to be in control of these timings below you will find plenty of options to do so. However, modifying these options should be avoided. If you use tight timings which might create an overhead on etcd as every node will try to update its status very often.
除此之外,目前尚不清楚如何将控制器管理器,api服务器和kubelet配置中的更改传播到活动群集中的所有节点.请参阅更改集群的跟踪问题和在活动集群中重新配置节点的kubelet 是在测试版中.
In addition to this, currently it is not clear how to propagate changes in controller manager, api server and kubelet configuration to all nodes in a living cluster. Please see Tracking issue for changing the cluster and Dynamic Kubelet Configuration. As of this writing, reconfiguring a node's kubelet in a live cluster is in beta.
您可以在kubeadm初始化或连接阶段配置控制平面和kubelet.请参考使用kubeadm自定义控制平面配置和使用kubeadm配置集群中的每个kubelet 有关更多详细信息.
You can configure control plane and kubelet during kubeadm init or join phase. Please refer to Customizing control plane configuration with kubeadm and Configuring each kubelet in your cluster using kubeadm for more details.
假设您有一个单节点集群:
Assuming you have a single node cluster:
- 控制器管理器包括:
-
--node-monitor-grace-period
默认40秒 -
--node-monitor-period
默认5秒 -
--pod-eviction-timeout
默认5m0s
- controller manager includes:
--node-monitor-grace-period
default 40s--node-monitor-period
default 5s--pod-eviction-timeout
default 5m0s
-
--default-not-ready-toleration-seconds
默认300 -
--default-unreachable-toleration-seconds
默认300
--default-not-ready-toleration-seconds
default 300--default-unreachable-toleration-seconds
default 300
-
--node-status-update-frequency
默认10秒
--node-status-update-frequency
default 10s
如果使用
kubeadm
设置集群,则可以修改:If you set up the cluster with
kubeadm
you can modify:-
/etc/kubernetes/manifests/kube-controller-manager.yaml
用于控制器管理器选项. -
/etc/kubernetes/manifests/kube-apiserver.yaml
用于api服务器选项.
/etc/kubernetes/manifests/kube-controller-manager.yaml
for controller manager options./etc/kubernetes/manifests/kube-apiserver.yaml
for api server options.
注意:修改这些文件将重新配置并重新启动节点中的相应Pod.
Note: Modifying these files will reconfigure and restart the respective pod in the node.
为了修改
kubelet
配置,您可以在以下行中添加:In order to modify
kubelet
config you can add below line:KUBELET_EXTRA_ARGS="--node-status-update-frequency=10s"
到
/etc/default/kubelet
(对于DEB)或/etc/sysconfig/kubelet
(对于RPM),然后重新启动kubelet服务:To
/etc/default/kubelet
(for DEBs), or/etc/sysconfig/kubelet
(for RPMs) and then restart kubelet service:sudo systemctl daemon-reload && sudo systemctl restart kubelet
这篇关于如果节点变为离线超时,Kubernetes将重新创建Pod的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-