自动配置的节点池未清除 [英] Auto-provisioned node pool is not getting cleaned up
问题描述
我有一个在GKE上启用了自动配置的Kubernetes集群.
I have a Kubernetes cluster with auto-provisioning enabled on GKE.
gcloud beta container clusters create "some-name" --zone "us-central1-a" \
--no-enable-basic-auth --cluster-version "1.13.11-gke.14" \
--machine-type "n1-standard-1" --image-type "COS" \
--disk-type "pd-standard" --disk-size "100" \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias \
--network "projects/default-project/global/networks/default" \
--subnetwork "projects/default-project/regions/us-central1/subnetworks/default" \
--default-max-pods-per-node "110" \
--enable-autoscaling --min-nodes "0" --max-nodes "8" \
--addons HorizontalPodAutoscaling,KubernetesDashboard \
--enable-autoupgrade --enable-autorepair \
--enable-autoprovisioning --min-cpu 1 --max-cpu 40 --min-memory 1 --max-memory 64
我运行的部署不适用于现有节点(具有1个CPU).
I ran a deployment which wouldn't fit on the existing node (which has 1 CPU).
kubectl run say-lol --image ubuntu:18.04 --requests cpu=4 -- bash -c 'echo lolol && sleep 30'
自动配置器正确检测到需要新的节点池,并创建了新集群并开始运行新部署. 但是,在不再需要它之后,它无法将其删除.
The auto-provisioner correctly detected that a new node pool was needed, and it created a new cluster and started running the new deployment. However, it was not able to delete it after it was no longer needed.
kubectl delete deployment say-lol
所有吊舱都装完后,新群集已经闲置了20多个小时.
After all pods are gone, the new cluster has been sitting idle for more than 20 hours.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-some-name-default-pool-5003d6ff-pd1p Ready <none> 21h v1.13.11-gke.14
gke-some-name-nap-n1-highcpu-8--585d94be-vbxw Ready <none> 21h v1.13.11-gke.14
$ kubectl get deployments
No resources found in default namespace.
$ kubectl get events
No resources found in default namespace.
为什么不清理昂贵的节点池?
Why isn't it cleaning up the expensive node pool?
推荐答案
我正在两个群集上进行繁殖,发现罪魁祸首与kube-dns吊舱高度相关.在群集1上,对于按比例放大的节点,没有kube-dns pod,并且在删除say-lol
后发生了按比例缩小.在群集2上,由于使用kube-dns pod,辅助节点没有按比例缩小.
I was reproducing on my two clusters and found out that culprit was highly related to the kube-dns pod. On cluster 1, for scaled up node, there was no kube-dns pod and scale down occurred after deleting say-lol
. On cluster 2, because of the kube-dns pod, the secondary node did not scale down.
遵循此文档/如何设置PDB以使CA能够移动kube系统吊舱?
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: kube-dns-pdb
namespace: kube-system
spec:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
我创建了一个pdb以允许kube-dns pod中断,从而可以缩小规模.您可以通过运行
I created a pdb to allow disruption of the kube-dns pod thus allowing downscaling. You can check if disruptions are allowed by running
kubectl get pdb -n kube-system
为了使流程正常运行,允许的中断应该具有非零值.
Allowed disruptions should have a non zero value for the process to work.
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
kube-dns-pdb N/A 1 1 28m
这篇关于自动配置的节点池未清除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!