“已使用内存"指标:Go 工具 pprof 与 docker stats [英] "Memory used" metric: Go tool pprof vs docker stats

查看:36
本文介绍了“已使用内存"指标:Go 工具 pprof 与 docker stats的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个在我的每个 docker 容器中运行的 golang 应用程序.它使用 protobufs 通过 tcp 和 udp 相互通信,我使用 Hashicorp 的成员列表库来发现我网络中的每个容器.在 docker stats 上,我看到内存使用量呈线性增加,因此我试图在我的应用程序中找到任何泄漏.

I wrote a golang application running in each of my docker containers. It communicates with each other using protobufs via tcp and udp and I use Hashicorp's memberlist library to discover each of the containers in my network. On docker stats I see that the memory usage is linearly increasing so I am trying to find any leaks in my application.

由于它是一个持续运行的应用程序,我使用 http pprof 检查任何一个容器中的实时应用程序.我看到 runtime.MemStats.sys 是恒定的,即使 docker stats 线性增加.我的 --inuse_space 大约为 1MB,并且 --alloc_space 当然随着时间的推移不断增加.这是 alloc_space 的示例:

Since it is an application which keeps running, am using http pprof to check the live application in any one of the containers. I see that runtime.MemStats.sys is constant even though docker stats is linearly increasing. My --inuse_space is around 1MB and --alloc_space ofcourse keeps increasing over time. Here is a sample of alloc_space:

root@n3:/app# go tool pprof --alloc_space main http://localhost:8080/debug/pprof/heap                                                                                                                       
Fetching profile from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.main.localhost:8080.alloc_objects.alloc_space.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top --cum
1024.11kB of 10298.19kB total ( 9.94%)
Dropped 8 nodes (cum <= 51.49kB)
Showing top 10 nodes out of 34 (cum >= 1536.07kB)
      flat  flat%   sum%        cum   cum%
         0     0%     0% 10298.19kB   100%  runtime.goexit
         0     0%     0%  6144.48kB 59.67%  main.Listener
         0     0%     0%  3072.20kB 29.83%  github.com/golang/protobuf/proto.Unmarshal
  512.10kB  4.97%  4.97%  3072.20kB 29.83%  github.com/golang/protobuf/proto.UnmarshalMerge
         0     0%  4.97%  2560.17kB 24.86%  github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).Unmarshal
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).unmarshalType
  512.01kB  4.97%  9.94%  2048.23kB 19.89%  main.SaveAsFile
         0     0%  9.94%  1536.07kB 14.92%  reflect.New
(pprof) list main.Listener
Total: 10.06MB
ROUTINE ======================== main.Listener in /app/listener.go
         0        6MB (flat, cum) 59.67% of Total
         .          .     24:   l.SetReadBuffer(MaxDatagramSize)
         .          .     25:   defer l.Close()
         .          .     26:   m := new(NewMsg)
         .          .     27:   b := make([]byte, MaxDatagramSize)
         .          .     28:   for {
         .   512.02kB     29:       n, src, err := l.ReadFromUDP(b)
         .          .     30:       if err != nil {
         .          .     31:           log.Fatal("ReadFromUDP failed:", err)
         .          .     32:       }
         .   512.02kB     33:       log.Println(n, "bytes read from", src)
         .          .     34:       //TODO remove later. For testing Fetcher only
         .          .     35:       if rand.Intn(100) < MCastDropPercent {
         .          .     36:           continue
         .          .     37:       }
         .        3MB     38:       err = proto.Unmarshal(b[:n], m)
         .          .     39:       if err != nil {
         .          .     40:           log.Fatal("protobuf Unmarshal failed", err)
         .          .     41:       }
         .          .     42:       id := m.GetHead().GetMsgId()
         .          .     43:       log.Println("CONFIG-UPDATE-RECEIVED { "update_id" =", id, "}")
         .          .     44:       //TODO check whether value already exists in store?
         .          .     45:       store.Add(id)
         .        2MB     46:       SaveAsFile(id, b[:n], StoreDir)
         .          .     47:       m.Reset()
         .          .     48:   }
         .          .     49:}
(pprof) 

我已经能够使用 http://:8080/debug/pprof/goroutine?debug=1 验证没有发生 goroutine 泄漏

I have been able to verify that no goroutine leak is happening using http://:8080/debug/pprof/goroutine?debug=1

请评论为什么 docker stats 显示不同的图片(线性增加内存)

Please comment on why docker stats shows a different picture (linearly increasing memory)

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O           PIDS
n3                  0.13%               19.73 MiB / 31.36 GiB   0.06%               595 kB / 806 B        0 B / 73.73 kB      14

如果我通宵运行它,这个内存膨胀到大约 250MB.我还没有运行它比这更长的时间,但我觉得这应该达到稳定状态而不是线性增加

If I run it over night, this memory bloats to around 250MB. I have not run it longer than that, but I feel this should have reached a plateau instead of increasing linearly

推荐答案

docker stats 显示来自 cgroups 的内存使用统计信息.(参考:https://docs.docker.com/engine/admin/runmetrics/)

docker stats shows the memory usage stats from cgroups. (Refer: https://docs.docker.com/engine/admin/runmetrics/)

如果您阅读了过时但有用"的文档(https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) 它说

If you read the "outdated but useful" documentation (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) it says

5.5 usage_in_bytes

5.5 usage_in_bytes

为了效率,与其他内核组件一样,内存 cgroup 使用了一些优化以避免不必要的缓存行错误共享.usage_in_bytes 受该方法的影响并且不显示精确"内存(和交换)使用的价值,这是高效的模糊值使用权.(当然,必要时,它是同步的.)如果你想了解更准确的内存使用情况,您应该使用 RSS+CACHE(+SWAP) 值memory.stat(见5.2).

For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2).

Page Cache 和 RES 包含在 memory usage_in_bytes 数中.所以如果容器有文件 I/O,内存使用统计会增加.但是,对于容器,如果使用量达到最大限制,它会回收一些未使用的内存.因此,当我向容器添加内存限制时,我可以观察到内存在达到限制时被回收和使用.除非没有要回收的内存并且发生 OOM 错误,否则容器进程不会被终止.对于任何关心 docker stats 中显示的数字的人,简单的方法是检查 cgroups 中可用的详细统计信息,路径为:/sys/fs/cgroup/memory/docker//这会详细显示 memory.stats 或其他 memory.* 文件中的所有内存指标.

Page Cache and RES are included in the memory usage_in_bytes number. So if the container has File I/O, the memory usage stat will increase. However, for a container, if the usage hits that maximum limit, it reclaims some of the memory which is unused. Hence, when I added a memory limit to my container, I could observe that the memory is reclaimed and used when the limit is hit. The container processes are not killed unless there is no memory to reclaim and a OOM error happens. For anyone concerned with the numbers shown in docker stats, the easy way is to check the detailed stats available in cgroups at the path: /sys/fs/cgroup/memory/docker// This shows all the memory metrics in detail in memory.stats or other memory.* files.

如果您想在docker run"命令中限制 docker 容器使用的资源,您可以按照以下参考进行操作:https://docs.docker.com/engine/admin/resource_constraints/

If you want to limit the resources used by the docker container in the "docker run" command you can do so by following this reference: https://docs.docker.com/engine/admin/resource_constraints/

由于我使用的是 docker-compose,所以我通过在我想要限制的服务下的 docker-compose.yml 文件中添加一行来实现:

Since I am using docker-compose, I did it by adding a line in my docker-compose.yml file under the service I wanted to limit:

内存限制:32m

其中 m 代表兆字节.

where m stands for megabytes.

这篇关于“已使用内存"指标:Go 工具 pprof 与 docker stats的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆