Google Kubernetes:工作人员池未缩减为零 [英] Google Kubernetes: worker pool not scaling down to zero

查看:100
本文介绍了Google Kubernetes:工作人员池未缩减为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Google Kubernetes Engine上建立一个GKE集群,以运行一些繁重的工作.我有一个大型计算机的渲染池,我想将其从0自动缩放到N(使用

I'm setting up a GKE cluster on Google Kubernetes Engine to run some heavy jobs. I have a render-pool of big machines that I want to autoscale from 0 to N (using the cluster autoscaler). My default-pool is a cheap g1-small to run the system pods (those never go away so the default pool can't autoscale to 0, too bad).

我的问题是渲染池不想缩小到0.是那些问题吗?据我所知,默认池有足够的资源来运行所有资源.我已阅读

My problem is that the render-pool doesn't want to scale down to 0. It has some system pods running on it; are those the problem? The default pool has plenty of resources to run all of them as far as I can tell. I've read the autoscaler FAQ, and it looks like it should delete my node after 10 min of inactivity. I've waited an hour though.

我这样创建了渲染池:

gcloud container node-pools create render-pool-1 --cluster=test-zero-cluster-2 \
 --disk-size=60 --machine-type=n2-standard-8 --image-type=COS \
 --disk-type=pd-standard --preemptible --num-nodes=1 --max-nodes=3 --min-nodes=0 \
 --enable-autoscaling

cluster-autoscaler-status配置映射显示ScaleDown: NoCandidates,并且它应该经常探测该池.

The cluster-autoscaler-status configmap says ScaleDown: NoCandidates and it is probing the pool frequently, as it should.

我在做什么错,以及如何调试呢?我能看到为什么自动缩放器认为它不能删除该节点吗?

What am I doing wrong, and how do I debug it? Can I see why the autoscaler doesn't think it can delete the node?

推荐答案

正如注释中指出的,在GKE中,您具有日志记录Pod(流利的),kube-dns,监视等等,所有这些都被认为是系统Pod.这意味着,安排了调度的任何节点将成为缩减规模的候选对象.

In GKE, you have logging pods (fluentd), kube-dns, monitoring, etc., all considered system pods. This means that any node where they're scheduled, will not be a candidate for downscaling.

考虑到这一点,一切都归结为创建一个方案,其中所有

Considering this, it all boils down to creating an scenario where all the previous conditions for downscaling are met.

由于您只想按比例缩小特定的节点池,因此我将使用

Since you only want to scale down an specific node-pool, I'd use Taints and tolerations to keep system pods in the default pool.

特别是对于GKE,您可以按每个应用的k8s-app标签进行选择,例如:

For GKE specifically, you can pick each app by their k8s-app label, for instance:

$ kubectl taint nodes GPU-NODE k8s-app=heapster:NoSchedule

这将防止受污染的节点调度.

This will prevent the tainted nodes from scheduling Heapster.

不推荐,但是,您可以更广泛地尝试使用kubernetes.io/cluster-service来获取所有GKE系统吊舱:

Not recommended but, you can go broader and try to get all the GKE system pods using kubernetes.io/cluster-service instead:

$ kubectl taint nodes GPU-NODE kubernetes.io/cluster-service=true:NoSchedule

请小心,因为此标签的范围更广,并且您必须跟踪即将发生的更改,因为此标签可能是

Just be careful as the scope of this label is broader and you'll have to keep track of oncoming changes, as this label is possibily going to be deprecated someday.

您可能要考虑的另一件事是使用广告连播中断预算.在无状态工作负载中,这可能更有效,但是将其设置得过紧可能会导致不稳定.

Another thing that you might want to consider is using Pod Disruption Budgets. This might be more effective in stateless workloads, but setting it very tight is likely to cause inestability.

PDB的想法是告诉GKE在任何给定时间可以运行的Pod的最小量是多少,从而允许CA驱逐它们.可以将其应用于如下所示的系统Pod:

The idea of a PDB is to tell GKE what's the very minimal amount of pods that can be run at any given time, allowing the CA to evict them. It can be applied to system pods like below:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: dns-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns

这告诉GKE,尽管通常有 kube-dns 的3个副本,但应用程序可能可以进行2次中断并仅用1个副本进行临时维护,从而使CA逐出这些Pod,然后将其重新安排在其他节点中.

This tell GKE that, although there's usually 3 replicas of kube-dns, the application might be able to take 2 disruptions and sustain temporarily with only 1 replica, allowing the CA to evict these pods and reschedule them in other nodes.

您可能已经注意到,这会给群集中的DNS解析带来压力(在此特定示例中),因此请小心.

As you probably noticed, this will put stress on DNS resolution in the cluster (in this particular example), so be careful.

最后,关于如何调试CA.现在,考虑一下GKE是Kubernetes的托管版本,在这里您实际上并没有直接访问调整某些功能的能力(无论好坏).您无法在CA中设置标志,并且可以通过GCP支持来访问日志.这个想法是为了保护集群中运行的工作负载,而不是出于成本考虑.

Finally and regarding how to debug the CA. For now, consider that GKE is a managed version of Kubernetes where you don't really have direct access to tweak some features (for better or worse). You cannot set flags in the CA and access to logs could be through GCP support. The idea is to protect the workloads running in the cluster rather than to be cost-wise.

GKE的降级更多地是要结合使用Kubernetes中的不同功能,直到满足CA的降级条件为止.

Downscaling in GKE is more about using different features in Kubernetes together until the CA conditions for downscaling are met.

这篇关于Google Kubernetes:工作人员池未缩减为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆