CPU百分比超过100的Docker统计信息 [英] Docker stats with CPU percentage more than 100

查看:196
本文介绍了CPU百分比超过100的Docker统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对docker stats命令有疑问,是否有人可以帮助我.我是Docker领域的新手,我想监视Docker容器的cpu使用情况.

I have a question about docker stats command if anyone can help me. I am new in Docker area and I want to monitor the cpu usage of a docker container.

物理机具有8个内核(CPU0 ... CPU7).我已经创建了一个容器,并使用以下命令将其cpu资源限制为1个核心(CPU0):docker run -itd --cpuset-cpus = 0 -p 8081:8080 binfalse/bives-webapp

The physical machine has 8 cores (CPU0...CPU7). I already create a container and limit its cpu resource to 1 core (CPU0) using the following command: docker run -itd --cpuset-cpus=0 -p 8081:8080 binfalse/bives-webapp

我通过发送来自Jmeter的请求来强调容器,然后通过docker stats命令监视该容器的cpu使用情况,该命令会为我提供大于100%的值.

I stress the container by sending requests from Jmeter and then monitor the cpu usage of the container via docker stats command which gives me values greater than 100%.

我不明白为什么即使仅将一个内核分配给容器,它也提供100%以上的结果!您对原因有任何想法吗?除了容器之外,此cpu值是否还代表某些系统进程的cpu使用情况?

I don't understand why it gives more than 100% even if only one core is allocated to the container!. Do you have any idea about the reason? Does this cpu value represents the cpu usage of some system processes in addition to the container?

预先感谢您的帮助.

docker版本:客户:版本:17.06.0-ceAPI版本:1.30Go版本:go1.8.3Git提交:02c1d87建成:2017年6月23日星期五21:23:31操作系统/Arch:linux/amd64

docker version: Client: Version: 17.06.0-ce API version: 1.30 Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:23:31 2017 OS/Arch: linux/amd64

服务器:版本:17.06.0-ceAPI版本:1.30(最低版本1.12)Go版本:go1.8.3Git提交:02c1d87建成:2017年6月23日星期五21:19:04操作系统/Arch:linux/amd64实验性:正确

Server: Version: 17.06.0-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:19:04 2017 OS/Arch: linux/amd64 Experimental: true

码头工人信息结果:货柜:2跑步:1已暂停:0已停止:1图片:10服务器版本:17.06.0-ce存储驱动程序:aufs根目录:/var/lib/docker/aufs支持文件系统:extfsdirs:141Dirperm1支持:true记录驱动程序:json-fileCgroup驱动程序:cgroupfs外挂程式:数量:本地网络:网桥主机ipvlan macvlan空覆盖日志:awslogs流利的gcplogs gelf记录日志的json文件日志条目splunk syslog群:不活动运行时:runc默认运行时:runc初始化二进制文件:docker-init容器版本:cfb82a876ecc11b5ca0977d1733adbe58599088arunc版本:2d41c047c83e09a6d61d464906feb2a2f3c52aa4初始版本:949e6fa安全选项:装甲兵seccomp个人资料:默认内核版本:4.4.0-98-generic作业系统:Ubuntu 16.04.2 LTS操作系统类型:linux架构:x86_64处理器:8总内存:15.56GiB名称:logti048131ID:RHOG:IR6N:FVC4:YDI5:A6T4:QA4Y:DDYF:7HZN:AI3L:WVLE:BNHY:6YNVDocker根目录:/var/lib/docker调试模式(客户端):false调试模式(服务器):false注册表: https://index.docker.io/v1/实验性:真实不安全的注册表:127.0.0.0/8启用实时还原:false

docker info result: Containers: 2 Running: 1 Paused: 0 Stopped: 1 Images: 10 Server Version: 17.06.0-ce Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 141 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4 init version: 949e6fa Security Options: apparmor seccomp Profile: default Kernel Version: 4.4.0-98-generic Operating System: Ubuntu 16.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.56GiB Name: logti048131 ID: RHOG:IR6N:FVC4:YDI5:A6T4:QA4Y:DDYF:7HZN:AI3L:WVLE:BNHY:6YNV Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

警告:不支持交换限制

推荐答案

在Linux上,cgroup和Docker CPU的统计信息处理CPU的时间片",即CPU一直使用的纳秒数.为了获得百分比,将容器cgroup的使用时间"值与来自/proc/stat 的可用时间"的整个系统值进行比较.

On Linux, cgroups and Docker CPU stats deal in "time slices" of CPU, the number of nanoseconds the CPU has been in use for. To get the percentage, the container cgroup value of "time used" is compared to the overall system value for "time available" from /proc/stat.

由于存储的时间片"值是累积的,因此将当前值与先前收集的值进行比较,以获得更瞬时的百分比.我认为这种比较是问题的基础.

Due to the stored "time slice" values being cumulative, the current values are compared to the previous collected values to get a more instantaneous percentage. I think this comparison is the basis of the issue.

docker stats 命令实际上在客户端中为此信息做了很多工作.客户端查询所有容器,监视事件以了解容器的启动/停止并打开每个正在运行的容器的单独统计信息流.这些容器统计信息流用于计算流中每次统计数据转储的百分比.

The docker stats command actually does a lot of the leg work for this information in the client. The client queries for all containers, watches events for container start/stops and opens an individual stats stream for each running container. These streams of container stats are used to calculate the percentages on each dump of stats data from a stream.

对于容器统计信息流,Docker守护进程收集系统首先使用了CPU时间.然后,它使用libcontainer 读取容器中的cgroup文件并将文本解析为值.这是所有统计数据结构.然后将所有内容作为JSON响应发送到客户端进行处理.

For the container stats stream, the Docker daemon collects the systems used cpu time first. It then uses libcontainer to read in a containers cgroup files and parse the text into values. Here are all the stats data structures. That is all then sent to the client as a JSON response for processing.

我认为问题的至少一部分源于在不同时间读取和解析/proc/stat 系统信息和容器cgroup统计信息.每次读取容器信息的goroutine稍有延迟,与系统相比,该样本中包含的纳秒级数更多.由于收集过程计划每隔X秒运行一次,因此下一次读取将包含较少的总纳秒,因此这些值可以在繁忙的系统上弹起,然后回退与第二秒中未包含的完整滴答声"相同的值样本.

I believe at least part of the problem stems from reading and parsing the /proc/stat system information and container cgroup stats at different times. Every time the goroutine that reads the container info is delayed a bit, more nanoseconds are included in that sample compared to the system. As the collection process is scheduled to run every X seconds, the next read then includes less total nanoseconds so the values can bounce up on a busy system, then back down the same amount as there is not a full 'tick' included in the second sample.

问题使您运行的容器越多,系统变得越忙.统计信息的收集和转发给客户端似乎是一个相对繁重的过程,仅具有大量容器的 docker stats 足以引起更多的不准确性.我最好的猜测是所有试图读取统计信息的goroutine中的争用.我不确定这是否会解释Docker显示的不准确程度.我完全错了,或者还有其他问题在加重.

The issue compounds the more containers you run and the busier the system gets. The stats gathering and forwarding to the client seems to be a relatively heavyweight process, just docker stats with a large number of containers is enough to cause more inaccuracy. My best guess is contention in the goroutines that are all trying to read the stats. I'm not sure that would account for quite the level of inaccuracy the Docker shows. I'm either completely wrong or there's something else adding to the problem.

cgroup 中跟踪每个Docker容器的使用情况.可以通过cgroup文件系统查看CPU记帐信息:

Each Docker containers usage is tracked in a cgroup. The CPU accounting information can be viewed via the cgroup file system:

→ find /sys/fs/cgroup/cpuacct/docker -type d
/sys/fs/cgroup/cpuacct/docker
/sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
/sys/fs/cgroup/cpuacct/docker/5ac4753f955acbdf38beccbcc273f954489b2a00049617fdb0f9da6865707717
/sys/fs/cgroup/cpuacct/docker/a4e00d69819a15602cbfb4f86028a4175e16415ab9e2e9a9989fafa35bdb2edf
/sys/fs/cgroup/cpuacct/docker/af00983b1432d9ffa6de248cf154a1f1b88e6b9bbebb7da2485d94a38f9e7e15

→ cd /sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
→ ls -l
total 0
-rw-r--r--    1 root     root             0 Nov 20 22:31 cgroup.clone_children
-rw-r--r--    1 root     root             0 Nov 20 04:35 cgroup.procs
-r--r--r--    1 root     root             0 Nov 20 21:51 cpuacct.stat
-rw-r--r--    1 root     root             0 Nov 20 21:51 cpuacct.usage
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_all
-r--r--r--    1 root     root             0 Nov 20 21:51 cpuacct.usage_percpu
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_percpu_sys
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_percpu_user
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_sys
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_user
-rw-r--r--    1 root     root             0 Nov 20 22:31 notify_on_release
-rw-r--r--    1 root     root             0 Nov 20 22:31 tasks

→ cat cpuacct.usage_percpu
3625488147 6265485043 6504277830 

每个值是该CPU上的累积使用量(以纳秒为单位).

Each value is the cumulative usage in nano seconds on that CPU.

→ grep -w ^cpu /proc/stat
cpu  475761 0 10945 582794 2772 0 159 0 0 0

此处的值是 USER_HZ == 1/100秒,因此在Docker中进行一些转换.

Values here are USER_HZ == 1/100 of a second, so get some conversion in Docker.

这篇关于CPU百分比超过100的Docker统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆