集群自动扩缩器未降级 [英] Cluster autoscaler not downscaling
问题描述
我在 google kubernetes 引擎 (GKE) 中设置了一个区域集群.节点组是每个区域中的单个 vm(共 3 个).我有一个由 HPA 控制的最少 3 个副本的部署.节点组配置为自动缩放(集群自动缩放又名 CA).问题场景:
I have a regional cluster set up in google kubernetes engine (GKE). The node group is a single vm in each region (3 total). I have a deployment with 3 replicas minimum controlled by a HPA. The nodegroup is configured to be autoscaling (cluster autoscaling aka CA). The problem scenario:
更新部署映像.Kubernetes 会自动创建新的 pod,CA 会识别出需要一个新节点.我现在有4个.当所有新 Pod 启动时,旧 Pod 会被删除,这意味着我的 CPU 请求与前一分钟完全相同.但是在 10 分钟的最大缩减时间之后,我仍然有 4 个节点.
Update deployment image. Kubernetes automatically creates new pods and the CA identifies that a new node is needed. I now have 4. The old pods get removed when all new pods have started, which means I have the exact same CPU request as the minute before. But the after the 10 min maximum downscale time I still have 4 nodes.
节点的 CPU 请求现在是:
The CPU requests for the nodes is now:
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
358m (38%) 138m (14%) 516896Ki (19%) 609056Ki (22%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
800m (85%) 0 (0%) 200Mi (7%) 300Mi (11%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
510m (54%) 100m (10%) 410Mi (15%) 770Mi (29%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
823m (87%) 158m (16%) 484Mi (18%) 894Mi (33%)
38% 的节点正在运行:
The 38% node is running:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system event-exporter-v0.1.9-5c8fb98cdb-8v48h 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v2.0.17-q29t2 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system heapster-v1.5.2-585f569d7f-886xx 138m (14%) 138m (14%) 301856Ki (11%) 301856Ki (11%)
kube-system kube-dns-autoscaler-69c5cbdcdd-rk7sd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-production-cluster-default-pool-0fd62aac-7kls 100m (10%) 0 (0%) 0 (0%) 0 (0%)
我怀疑它不会因为 heapster 或 kube-dns-autoscaler 而缩小规模.但是 85% 的 pod 包含:
I suspect it wont downscale because heapster or kube-dns-autoscaler. But the 85% pod contains:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system fluentd-gcp-v2.0.17-s25bk 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system kube-proxy-gke-production-cluster-default-pool-7ffeacff-mh6p 100m (10%) 0 (0%) 0 (0%) 0 (0%)
my-deploy my-deploy-54fc6b67cf-7nklb 300m (31%) 0 (0%) 0 (0%) 0 (0%)
my-deploy my-deploy-54fc6b67cf-zl7mr 300m (31%) 0 (0%) 0 (0%) 0 (0%)
fluentd 和 kube-proxy pod 存在于每个节点上,所以我假设没有节点就不需要它们.这意味着我的部署可以重定位到其他节点,因为它只有 300m 的请求(31%,因为只有 94% 的节点 CPU 是可分配的).
The fluentd and kube-proxy pods are present on every node, so I assume they are not needed without the node. Which means that my deployment could be relocated to the other nodes since it only has a request of 300m (31% since only 94% of node CPU is allocatable).
所以我想我会检查日志.但是,如果我运行 kubectl get pods --all-namespaces
,则 CA 的 GKE 上没有可见的 pod.如果我使用命令 kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
它只会告诉我它是否即将扩展,而不是为什么或为什么不.另一种选择是查看主节点中的 /var/log/cluster-autoscaler.log
.我 SSH:ed 在所有 4 个节点中只找到了一个 gcp-cluster-autoscaler.log.pos
文件,上面写着:/var/log/cluster-autoscaler.log 0000000000000000 0000000000000000
代码> 意味着文件应该就在那里,但它是空的.根据 FAQ,最后一个选项是检查Pod 的事件,但据我所知它们是空的.
So I figured that Ill check the logs. But if I run kubectl get pods --all-namespaces
there are no pod visible on GKE for the CA. And if I use the command kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
it only tells me if it is about to scale, not why or why not.
Another option is to look at /var/log/cluster-autoscaler.log
in the master node. I SSH:ed in the all 4 nodes and only found a gcp-cluster-autoscaler.log.pos
file that says: /var/log/cluster-autoscaler.log 0000000000000000 0000000000000000
meaning the file should be right there but is empty.
Last option according to the FAQ, is to check the events for the pods, but as far as i can tell they are empty.
有谁知道为什么它不会缩小或至少在哪里可以找到日志?
Anyone know why it wont downscale or atleast where to find the logs?
推荐答案
回答自己的可见性.
问题是 CA 从不考虑移动任何东西,除非 FAQ 同时满足.所以可以说我有 100 个节点,有 51% 的 CPU 请求.它仍然不会考虑缩小规模.
The problem is that the CA never considers moving anything unless all the requirements mentioned in the FAQ are met at the same time. So lets say I have 100 nodes with 51% CPU requests. It still wont consider downscaling.
一种解决方案是增加 CA 检查的值,现在是 50%.但不幸的是 GKE 不支持,请参阅来自 google support @GalloCedrone 的回答:
One solution is to increase the value at which CA checks, now 50%. But unfortunately that is not supported by GKE, see answer from google support @GalloCedrone:
此外,我知道这个值可能听起来太低,有人可能有兴趣保持 85% 或 90% 以避免出现您的情况.目前有一个功能请求开放,让用户可以修改标志--scale-down-utilization-threshold",但尚未实现.
Moreover I know that this value might sound too low and someone could be interested to keep as well a 85% or 90% to avoid your scenario. Currently there is a feature request open to give the user the possibility to modify the flag "--scale-down-utilization-threshold", but it is not implemented yet.
我发现的解决方法是减少 Pod 的 CPU 请求(100m 而不是 300m),并让 Horizontal Pod Autoscaler (HPA) 按需创建更多.这对我来说很好,但是如果您的应用程序不适合许多小实例,那么您就不走运了.如果总利用率低,也许是一个封锁节点的 cron 作业?
The workaround I found is to decrease the CPU request (100m instead of 300m) of the pods and have the Horizontal Pod Autoscaler (HPA) create more on demand. This is fine for me but if your application is not suitable for many small instances you are out of luck. Perhaps a cron job that cordons a node if the total utilization is low?
这篇关于集群自动扩缩器未降级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!