“使用的存储器”公制:去工具pprof vs泊坞站统计 [英] "Memory used" metric: Go tool pprof vs docker stats

查看:172
本文介绍了“使用的存储器”公制:去工具pprof vs泊坞站统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个在每个Docker容器中运行的golang应用程序。它通过tcp和udp使用protobufs进行通信,我使用Hashicorp的成员列表库来发现我的网络中的每个容器。
在docker统计信息中,我发现内存使用量是线性增长的,所以我试图在我的应用程序中找到任何泄漏。



由于它是一个持续运行的应用程序,所以使用http pprof检查任何一个容器中的实时应用程序。
我看到runtime.MemStats.sys是恒定的,即使docker统计线性增加。
我的--inuse_space大约是1MB,而--alloc_space的时间不断增加。以下是alloc_space的示例:

  root @ n3:/ app#go tool pprof --alloc_space main http:// localhost :8080 / debug / pprof / heap 
从http:// localhost:8080 / debug / pprof / heap获取配置文件
在/root/pprof/pprof.main.localhost:8080.alloc_objects中保存配置文件。 alloc_space.005.pb.gz
输入交互模式(键入help命令)
(pprof)top --cum
总共10298.19kB的1024.11kB(9.94%)
下降8个节点(暨<= 51.49kB)
显示34个中的前10个节点(暨> = 1536.07kB)
平面平均值%和%暨%
0 0% 0%10298.19kB 100%runtime.goexit
0 0%0%6144.48kB 59.67%main.Listener
0 0%0%3072.20kB 29.83%github.com/golang/protobuf/proto.Unmarshal
512.10kB 4.97%4.97%3072.20kB 29.83%github.com/golang/protobuf/proto。 UnmarshalMerge
0 0%4.97%2560.17kB 24.86%github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
0 0%4.97%2560.10kB 24.86%github.com/golang/protobuf/proto 。(* Buffer).Unmarshal
0 0%4.97%2560.10kB 24.86%github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
0 0%4.97%2560.10kB 24.86%github .com / golang / protobuf / proto。(*缓冲区).unmarshalType
512.01kB 4.97%9.94%2048.23kB 19.89%main.SaveAsFile
0 0%9.94%1536.07kB 14.92%reflect.New
(pprof)list main.Listener
总计:10.06MB
ROUTINE ====================== main.Listener in /app/listener.go
0 6MB(平,暨)总额的59.67%
。 。 24:l.SetReadBuffer(MaxDatagramSize)
。 。 25:延迟l.Close()
。 。 26:m:= new(NewMsg)
。 。 27:b:= make([] byte,MaxDatagramSize)
。 。 28:对于{
。 512.02kB 29:n,src,err:= l.ReadFromUDP(b)
。 。 30:if err!= nil {
。 。 31:log.Fatal(ReadFromUDP failed:,err)
。 。 32:}
。 512.02kB 33:log.Println(n,bytes from,src)
。 。 34:// TODO稍后删除仅用于测试Fetcher
。 。 35:如果rand.Int(100)< MCastDropPercent {
。 。 36:继续
。 。 37:}
。 3MB 38:err = proto.Unmarshal(b [:n],m)
。 。 39:如果err!= nil {
。 。 40:log.Fatal(protobuf Unmarshal failed,err)
。 。 41:}
。 。 42:id:= m.GetHead()。GetMsgId()
。 。 43:log.Println(CONFIG-UPDATE-RECEIVED {\update_id \=,id,})
。 。 44:// TODO检查商店中是否存在值?
。 。 45:store.Add(id)
。 2MB 46:SaveAsFile(id,b [:n],StoreDir)
。 。 47:m.Reset()
。 。 48:}
。 。 49:}
(pprof)

我已经能够验证没有goroutine泄漏发生使用http://:8080 / debug / pprof / goroutine?debug = 1



请评论为什么docker统计数据显示不同的图片(线性增加内存) p>

  CONTAINER CPU%MEM使用/限制MEM%NET I / O块I / O PIDS 
n3 0.13%19.73 MiB / 31.36 GiB 0.06%595 kB / 806 B 0 B / 73.73 kB 14

如果我运行过夜,这个记忆膨胀到250MB左右。我没有运行更长的时间,但我觉得这应该已经达到高原,而不是线性增加。

解决方案

docker stats shows来自cgroups的内存使用统计信息。 (请参阅: https://docs.docker.com/engine/admin/runmetrics/



如果您阅读过时但有用的文档( https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt )它说


5.5 usage_in_bytes



为了效率,作为其他内核组件,内存cgroup使用一些
优化来避免不必要高速缓存行虚假共享。
usage_in_bytes受到该方法的影响,并且不显示exact
的内存(和交换)使用值,它是高效的
访问的一个fuzz值。 (当然需要的时候同步)如果你想要
知道更精确的内存使用,你应该使用
memory.stat中的RSS + CACHE(+ SWAP)值(见5.2)。 / p>

页面缓存和RES包含在内存usage_in_bytes号中。所以如果容器有文件I / O,内存使用率会增加。但是,对于容器,如果使用率达到最大限制,则会回收一些未使用的内存。因此,当我向容器添加内存限制时,我可以观察到内存被回收并在限制被击中时被使用。容器进程没有被杀死,除非没有内存回收和OOM错误发生。对于有关Docker统计信息中显示的数字的任何人来说,简单的方法是检查cgroups中可用路径的详细统计信息:/ sys / fs / cgroup / memory / docker //
这显示了所有内存指标memory.stats或其他内存*的文件。



如果要限制Docker容器在docker run命令中使用的资源,您可以这样做按照以下参考: https://docs.docker.com/engine/admin/resource_constraints/



由于我使用docker-compose,我通过在我要限制的服务的docker-compose.yml文件中添加一行:


mem_limit:32m


其中m代表兆字节。


I wrote a golang application running in each of my docker containers. It communicates with each other using protobufs via tcp and udp and I use Hashicorp's memberlist library to discover each of the containers in my network. On docker stats I see that the memory usage is linearly increasing so I am trying to find any leaks in my application.

Since it is an application which keeps running, am using http pprof to check the live application in any one of the containers. I see that runtime.MemStats.sys is constant even though docker stats is linearly increasing. My --inuse_space is around 1MB and --alloc_space ofcourse keeps increasing over time. Here is a sample of alloc_space:

root@n3:/app# go tool pprof --alloc_space main http://localhost:8080/debug/pprof/heap                                                                                                                       
Fetching profile from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.main.localhost:8080.alloc_objects.alloc_space.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top --cum
1024.11kB of 10298.19kB total ( 9.94%)
Dropped 8 nodes (cum <= 51.49kB)
Showing top 10 nodes out of 34 (cum >= 1536.07kB)
      flat  flat%   sum%        cum   cum%
         0     0%     0% 10298.19kB   100%  runtime.goexit
         0     0%     0%  6144.48kB 59.67%  main.Listener
         0     0%     0%  3072.20kB 29.83%  github.com/golang/protobuf/proto.Unmarshal
  512.10kB  4.97%  4.97%  3072.20kB 29.83%  github.com/golang/protobuf/proto.UnmarshalMerge
         0     0%  4.97%  2560.17kB 24.86%  github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).Unmarshal
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
         0     0%  4.97%  2560.10kB 24.86%  github.com/golang/protobuf/proto.(*Buffer).unmarshalType
  512.01kB  4.97%  9.94%  2048.23kB 19.89%  main.SaveAsFile
         0     0%  9.94%  1536.07kB 14.92%  reflect.New
(pprof) list main.Listener
Total: 10.06MB
ROUTINE ======================== main.Listener in /app/listener.go
         0        6MB (flat, cum) 59.67% of Total
         .          .     24:   l.SetReadBuffer(MaxDatagramSize)
         .          .     25:   defer l.Close()
         .          .     26:   m := new(NewMsg)
         .          .     27:   b := make([]byte, MaxDatagramSize)
         .          .     28:   for {
         .   512.02kB     29:       n, src, err := l.ReadFromUDP(b)
         .          .     30:       if err != nil {
         .          .     31:           log.Fatal("ReadFromUDP failed:", err)
         .          .     32:       }
         .   512.02kB     33:       log.Println(n, "bytes read from", src)
         .          .     34:       //TODO remove later. For testing Fetcher only
         .          .     35:       if rand.Intn(100) < MCastDropPercent {
         .          .     36:           continue
         .          .     37:       }
         .        3MB     38:       err = proto.Unmarshal(b[:n], m)
         .          .     39:       if err != nil {
         .          .     40:           log.Fatal("protobuf Unmarshal failed", err)
         .          .     41:       }
         .          .     42:       id := m.GetHead().GetMsgId()
         .          .     43:       log.Println("CONFIG-UPDATE-RECEIVED { \"update_id\" =", id, "}")
         .          .     44:       //TODO check whether value already exists in store?
         .          .     45:       store.Add(id)
         .        2MB     46:       SaveAsFile(id, b[:n], StoreDir)
         .          .     47:       m.Reset()
         .          .     48:   }
         .          .     49:}
(pprof) 

I have been able to verify that no goroutine leak is happening using http://:8080/debug/pprof/goroutine?debug=1

Please comment on why docker stats shows a different picture (linearly increasing memory)

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O           PIDS
n3                  0.13%               19.73 MiB / 31.36 GiB   0.06%               595 kB / 806 B        0 B / 73.73 kB      14

If I run it over night, this memory bloats to around 250MB. I have not run it longer than that, but I feel this should have reached a plateau instead of increasing linearly

解决方案

docker stats shows the memory usage stats from cgroups. (Refer: https://docs.docker.com/engine/admin/runmetrics/)

If you read the "outdated but useful" documentation (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) it says

5.5 usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2).

Page Cache and RES are included in the memory usage_in_bytes number. So if the container has File I/O, the memory usage stat will increase. However, for a container, if the usage hits that maximum limit, it reclaims some of the memory which is unused. Hence, when I added a memory limit to my container, I could observe that the memory is reclaimed and used when the limit is hit. The container processes are not killed unless there is no memory to reclaim and a OOM error happens. For anyone concerned with the numbers shown in docker stats, the easy way is to check the detailed stats available in cgroups at the path: /sys/fs/cgroup/memory/docker// This shows all the memory metrics in detail in memory.stats or other memory.* files.

If you want to limit the resources used by the docker container in the "docker run" command you can do so by following this reference: https://docs.docker.com/engine/admin/resource_constraints/

Since I am using docker-compose, I did it by adding a line in my docker-compose.yml file under the service I wanted to limit:

mem_limit: 32m

where m stands for megabytes.

这篇关于“使用的存储器”公制:去工具pprof vs泊坞站统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆