自动配置的节点池未清除 [英] Auto-provisioned node pool is not getting cleaned up

查看:241
本文介绍了自动配置的节点池未清除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在GKE上启用了自动配置的Kubernetes集群.

I have a Kubernetes cluster with auto-provisioning enabled on GKE.

gcloud beta container clusters create "some-name" --zone "us-central1-a" \
  --no-enable-basic-auth --cluster-version "1.13.11-gke.14" \
  --machine-type "n1-standard-1" --image-type "COS" \
  --disk-type "pd-standard" --disk-size "100" \
  --metadata disable-legacy-endpoints=true \
  --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
  --num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias \
  --network "projects/default-project/global/networks/default" \
  --subnetwork "projects/default-project/regions/us-central1/subnetworks/default" \
  --default-max-pods-per-node "110" \
  --enable-autoscaling --min-nodes "0" --max-nodes "8" \
  --addons HorizontalPodAutoscaling,KubernetesDashboard \
  --enable-autoupgrade --enable-autorepair \
  --enable-autoprovisioning --min-cpu 1 --max-cpu 40 --min-memory 1 --max-memory 64

我运行的部署不适用于现有节点(具有1个CPU).

I ran a deployment which wouldn't fit on the existing node (which has 1 CPU).

kubectl run say-lol --image ubuntu:18.04 --requests cpu=4 -- bash -c 'echo lolol && sleep 30'

自动配置器正确检测到需要新的节点池,并创建了新集群并开始运行新部署. 但是,在不再需要它之后,它无法将其删除.

The auto-provisioner correctly detected that a new node pool was needed, and it created a new cluster and started running the new deployment. However, it was not able to delete it after it was no longer needed.

kubectl delete deployment say-lol

所有吊舱都装完后,新群集已经闲置了20多个小时.

After all pods are gone, the new cluster has been sitting idle for more than 20 hours.

$ kubectl get nodes
NAME                                                  STATUS   ROLES    AGE   VERSION
gke-some-name-default-pool-5003d6ff-pd1p        Ready    <none>   21h   v1.13.11-gke.14
gke-some-name-nap-n1-highcpu-8--585d94be-vbxw   Ready    <none>   21h   v1.13.11-gke.14

$ kubectl get deployments
No resources found in default namespace.

$ kubectl get events
No resources found in default namespace.

为什么不清理昂贵的节点池?

Why isn't it cleaning up the expensive node pool?

推荐答案

我正在两个群集上进行繁殖,发现罪魁祸首与kube-dns吊舱高度相关.在群集1上,对于按比例放大的节点,没有kube-dns pod,并且在删除say-lol后发生了按比例缩小.在群集2上,由于使用kube-dns pod,辅助节点没有按比例缩小.

I was reproducing on my two clusters and found out that culprit was highly related to the kube-dns pod. On cluster 1, for scaled up node, there was no kube-dns pod and scale down occurred after deleting say-lol. On cluster 2, because of the kube-dns pod, the secondary node did not scale down.

遵循此文档/如何设置PDB以使CA能够移动kube系统吊舱?

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: kube-dns-pdb
  namespace: kube-system
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns

我创建了一个pdb以允许kube-dns pod中断,从而可以缩小规模.您可以通过运行

I created a pdb to allow disruption of the kube-dns pod thus allowing downscaling. You can check if disruptions are allowed by running

kubectl get pdb -n kube-system

为了使流程正常运行,允许的中断应该具有非零值.

Allowed disruptions should have a non zero value for the process to work.

NAME           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
kube-dns-pdb   N/A             1                 1                     28m

这篇关于自动配置的节点池未清除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆