Pod CPU 节流 [英] Pod CPU Throttling

查看:30
本文介绍了Pod CPU 节流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 Kubernetes 中的 CPU 请求/限制时遇到奇怪的问题.在设置任何 CPU 请求/限制之前,我所有的服务都表现得非常好.我最近开始放置一些资源配额以避免未来的资源匮乏.这些值是根据这些服务的实际使用情况设置的,但令我惊讶的是,在添加这些值后,一些服务的响应时间开始急剧增加.我的第一个猜测是我可能放置了错误的请求/限制,但查看指标发现实际上面临这个问题的服务都没有接近这些值.事实上,其中一些更接近请求而不是限制.

I'm experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my services performed very well. I recently started placing some Resource Quotas to avoid future resource starvation. These values were set based in the actual usage of those services, but to my surprise, after those were added, some services started to increase their response time drastically. My first guess was that I might placed wrong Requests/Limits, but looking at the metrics revealed that in fact none of the services facing this issue were near those values. In fact, some of them were closer to the Requests than the Limits.

然后我开始查看 CPU 节流指标,发现我的所有 pod 都受到了节流.然后我将其中一项服务的限制增加到 1000m(从 250m),我看到那个 pod 中的节流减少了,但我不明白如果 pod 没有达到它的旧限制(250m),为什么我应该设置更高的限制).

Then I started looking at CPU throttling metrics and found that all my pods are being throttled. I then increased the limits for one of the services to 1000m (from 250m) and I saw less throttling in that pod, but I don't understand why I should set that higher limit if the pod wasn't reaching its old limit (250m).

所以我的问题是:如果我没有达到 CPU 限制,为什么我的 Pod 会节流?如果 Pod 没有使用其全部容量,为什么我的响应时间会增加?

这里有一些我的指标截图(CPU 请求:50m,CPU 限制:250m):

Here there are some screenshots of my metrics (CPU Request: 50m, CPU Limit: 250m):

CPU 使用率(在这里我们可以看到这个 pod 的 CPU 从未达到其 250m 的限制):

CPU 节流:

将此 pod 的限制设置为 1000m 后,我们可以观察到更少的节流

After setting limits to this pod to 1000m, we can observe less throttling

kubectl top

P.S:在设置这些请求/限制之前,根本没有限制(正如预期的那样)

P.S: Before setting these Requests/Limits there wasn't throttling at all (as expected)

PS 2:我的节点都没有面临高使用率.事实上,他们中没有人在任何时候使用超过 50% 的 CPU.

P.S 2: None of my nodes are facing high usage. In fact, none of them are using more than 50% of CPU at any time.

提前致谢!

推荐答案

如果你看到 documentation 当您为 CPU 发出 Request 时,您会看到它实际上使用了 --cpu-shares Docker 中实际使用的选项cpu 的 cpu.shares 属性,Linux 上的 cpuacct cgroup.所以 50m 的值大约是 --cpu-shares=51 基于最大值为 1024.1024 代表 100% 的份额,所以 51 将是 4-5% 的份额.这是相当低的,开始.但这里的重要因素是,这与您系统上有多少个 pod/容器以及它们拥有的 cpu 共享量有关(它们是否使用默认值).

If you see the documentation you see when you issue a Request for CPUs it actually uses the --cpu-shares option in Docker which actually uses the cpu.shares attribute for the cpu,cpuacct cgroup on Linux. So a value of 50m is about --cpu-shares=51 based on the maximum being 1024. 1024 represents 100% of the shares, so 51 would be 4-5% of the share. That's pretty low, to begin with. But the important factor here is that this relative to how many pods/container you have on your system and what cpu-shares those have (are they using the default).

假设在您的节点上,您有另一个具有 1024 个共享的 pod/容器,这是默认值,并且您有这个具有 4-5 个共享的 pod/容器.然后这个容器将获得大约 0.5% 的 CPU,而另一个 pod/容器将获得大约 99.5% 的 CPU(如果它没有限制).因此,这一切都取决于您在节点上有多少个 pod/容器以及它们的份额是多少.

So let's say that on your node you have another pod/container with 1024 shares which is the default and you have this pod/container with 4-5 shares. Then this container will get about get about 0.5% CPU, while the other pod/container will get about 99.5% of the CPU (if it has not limits). So again it all depends on how many pods/container you have on the node and what their shares are.

此外,在 Kubernetes docs,但如果您在 pod 上使用 Limit,它基本上在 Docker 中使用两个标志:--cpu-period 和 --cpu--quota 实际使用cpu,cpuacct cgroup 在 Linux 上.这是因为 cpu.shares 没有提供限制,因此您会溢出容器占用大部分 CPU 的情况.

Also, not very well documented in the Kubernetes docs, but if you use Limit on a pod it's basically using two flags in Docker: --cpu-period and --cpu--quota which actually uses the cpu.cfs_period_us and the cpu.cfs_quota_us attributes for the cpu,cpuacct cgroup on Linux. This was introduced to the fact that cpu.shares didn't provide a limit so you'd spill over cases where containers would grab most of the CPU.

因此,就此限制而言,如果同一节点上的其他容器没有限制(或更高的限制)但具有更高的 cpu.shares,那么您将永远不会达到它,因为它们最终会优化并挑选空闲的CPU.这可能就是您所看到的,但同样取决于您的具体情况.

So, as far as this limit is concerned you will never hit it if you have other containers on the same node that don't have limits (or higher limits) but have a higher cpu.shares because they will end up optimizing and picking idle CPU. This could be what you are seeing, but again depends on your specific case.

对上述所有内容的详细解释 这里.

A longer explanation for all of the above here.

这篇关于Pod CPU 节流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆