从GHC分析器中获得意义 [英] Making sense from GHC profiler

查看:130
本文介绍了从GHC分析器中获得意义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从GHC分析器中弄清楚。有一个相当简单的应用程序,它使用 werq lens-aeson 库,并且在了解GHC分析时,决定玩一下。

使用不同的选项( time 工具, + RTS -p -RTS + RTS -p -h )我获得了完全不同的内存使用数。拥有所有这些数字,我现在完全失去了试图了解正在发生的事情以及应用程序实际使用的内存量。



这种情况让我想起了这句话: Arthur Bloch:有手表的人知道现在是几点钟,有两只手表的男人永远都不会确定。



请问你能否告诉我,我可以阅读所有这些数字,以及它们各自的意义。



以下是数字:

time -l <​​/ code>报告 19M

 #/ usr / bin / time -l ./simple-wreq 
...
3.02 real 0.39用户0.17 sys
19070976最大驻留集大小
0平均共享内存大小
0平均非共享数据大小
0平均非共享堆栈大小
21040页面回收
0页错误
0交换
0块输入操作
0块输出操作
发送71条消息
收到71条消息
收到信号2991
43自愿上下文切换
6490非自愿上下文切换

使用 + RTS -p -RTS 92M 周围标记报告。尽管它说总分配,但对我来说似乎很奇怪,像这样的简单应用程序可以分配和释放91M

 # ./simple-wreq + RTS -p -RTS 
#cat simple-wreq.prof
Fri Oct 14 15:08 2016时间和分配分析报告(最终)

简单-wreq + RTS -N -p -RTS

总时间= 0.07秒(69个滴答@ 1000 us,1个处理器)
总分配= 91,905,888字节(不包括分析开销)

成本中心模块%时间%分配

main.g主60.9 88.8
主要24.6 2.5
decodeLenient /外观Data.ByteString.Base64Internal 5.8 2.6
decodeLenientWithTable / fill Data.ByteString.Base64.Internal 2.9 0.1
decodeLenientWithTable.\.\\ \\.fill Data.ByteString.Base64.Internal 1.4 0.0
decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1
decodeLenientWithTable.\.\.fill .\\\\.\ Data.ByteString.Base64.Internal 1.4 3.3
decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4


个体继承
成本中心模块编号。项目%时间%分配%时间%分配

主要主要443 0 24.6 2.5 100.0 100.0
主要主要887 0 0.0 0.0 75.4 97.4
主要g主要889 0 60.9 88.8 75.4 97.4
object_Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2
jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2
unstream / resize Data.Text.Internal。 Fusion 923 600 0.0 0.3 0.0 0.3
decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1
decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7
....

+ RTS -p -h hp2ps 向我显示以下图片和两个数字:标题中的 114K 以及图上的 1.8Mb


为了以防万一,这里是应用程序:

 模块Main其中

导入Network.Wreq
导入Control.Lens
导入Data.Aeson.Lens
import Control.Monad
$ b $ main :: IO()
main = replicateM_ 10 g
其中
g = do
r< - gethttp:/ /httpbin.org/get
print $ r ^。 responseBody
。键标题
。键用户代理
。 _String

更新1:感谢所有人提供令人难以置信的良好回应。正如所建议的那样,我添加了 + RTS -s 输出,因此为阅读它的每个人提供了整个图片。

 #。/ simple-wreq + RTS -s 
...
在堆中分配的128,875,432个字节
GC $ b $中复制的32,414,616个字节b最大居民身份数为2,394,888字节(16个样本)
355,192字节最大值
使用中的总内存(由于分段造成0 MB丢失)

总计时间)平均暂停最大暂停
Gen 0 194 colls,0 par 0.018s 0.022s 0.0001s 0.0022s
Gen 1 16 colls,0 par 0.027s 0.031s 0.0019s 0.0042s

更新2 :可执行文件的大小:

  #du -h simple-wreq 
63M simple-wreq


解决方案


有腕表的男人知道现在是几点钟。一个有两只手表的男人永远不会确定。


啊,但是两只手表显示的是什么?都意味着以UTC显示当前时间?或者是他们中的一个应该以UTC显示时间,另一个是火星上某个点的时间?只要它们同步,第二种情况不会是一个问题,对吧?



而这正是这里发生的情况。您可以比较不同的记忆体衡量指标: b
$ b


  • 最高居住地

  • 分配的内存总量



最高居民位置是您的程序在内存中使用的最高内存量给定的时间。那是19MB。然而,分配的内存总量更多,因为这就是GHC的工作方式:它为垃圾收集的对象分配内存,这几乎是所有未解包的内容。



让我们看一个C示例:

  int main(){
int i;
char * mem; (i = 0; i <5; ++ i){
mem = malloc(19 * 1000 * 1000);


free(mem);
}
返回0;

$ / code>

每当我们使用 malloc ,我们将分配19兆字节的内存。但是,我们之后立即释放内存。因此,我们曾经拥有过的最高内存量是19兆字节(而对于堆栈和程序本身而言,还有一点点)。我们分配了5 * 19M,<95M的总数。尽管如此,我们仍然可以用20兆的RAM运行我们的小程序。这是总分配内存最高驻留之间的差异。请注意,按时间报告的驻留时间总是至少 du< executable> ,因为它也必须驻留在内存中。


$ b $据说,生成统计信息的最简单方法是 -s ,这将从Haskell的程序角度显示最大限度的居住权。在你的情况下,它将是 1.9M ,你的堆配置文件中的数字(或由于配置文件而增加一倍)。是的,Haskell可执行文件往往会变得非常大,因为库是静态链接的。


I'm trying to make sense from GHC profiler. There is a rather simple app, which uses werq and lens-aeson libraries, and while learning about GHC profiling, I decided to play with it a bit.

Using different options (time tool, +RTS -p -RTS and +RTS -p -h) I acquired entirely different numbers of my memory usage. Having all those numbers, I'm now completely lost trying to understand what is going on, and how much memory the app actually uses.

This situation reminds me the phrase by Arthur Bloch: "A man with a watch knows what time it is. A man with two watches is never sure."

Can you, please, suggest me, how I can read all those numbers, and what is the meaning of each of them.

Here are the numbers:

time -l reports around 19M

#/usr/bin/time -l ./simple-wreq
...
        3.02 real         0.39 user         0.17 sys
  19070976  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     21040  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
        71  messages sent
        71  messages received
      2991  signals received
        43  voluntary context switches
      6490  involuntary context switches

Using +RTS -p -RTS flag reports around 92M. Although it says "total alloc" it seems strange to me, that a simple app like this one can allocate and release 91M

# ./simple-wreq +RTS -p -RTS      
# cat simple-wreq.prof
        Fri Oct 14 15:08 2016 Time and Allocation Profiling Report  (Final)

           simple-wreq +RTS -N -p -RTS

        total time  =        0.07 secs   (69 ticks @ 1000 us, 1 processor)
        total alloc =  91,905,888 bytes  (excludes profiling overheads)

COST CENTRE                             MODULE                          %time %alloc

main.g                                  Main                             60.9   88.8
MAIN                                    MAIN                             24.6    2.5
decodeLenient/look                      Data.ByteString.Base64.Internal   5.8    2.6
decodeLenientWithTable/fill             Data.ByteString.Base64.Internal   2.9    0.1
decodeLenientWithTable.\.\.fill         Data.ByteString.Base64.Internal   1.4    0.0
decodeLenientWithTable.\.\.fill.\       Data.ByteString.Base64.Internal   1.4    0.1
decodeLenientWithTable.\.\.fill.\.\.\.\ Data.ByteString.Base64.Internal   1.4    3.3
decodeLenient                           Data.ByteString.Base64.Lazy       1.4    1.4


                                                                                                             individual     inherited
COST CENTRE                                              MODULE                            no.     entries  %time %alloc   %time %alloc

MAIN                                                     MAIN                              443           0   24.6    2.5   100.0  100.0
 main                                                    Main                              887           0    0.0    0.0    75.4   97.4
  main.g                                                 Main                              889           0   60.9   88.8    75.4   97.4
   object_                                               Data.Aeson.Parser.Internal        925           0    0.0    0.0     0.0    0.2
    jstring_                                             Data.Aeson.Parser.Internal        927          50    0.0    0.2     0.0    0.2
   unstream/resize                                       Data.Text.Internal.Fusion         923         600    0.0    0.3     0.0    0.3
   decodeLenient                                         Data.ByteString.Base64.Lazy       891           0    1.4    1.4    14.5    8.1
    decodeLenient                                        Data.ByteString.Base64            897         500    0.0    0.0    13.0    6.7
....

+RTS -p -h and hp2ps show me the following picture and two numbers: 114K in the header and something around 1.8Mb on the graph.

And, just in case, here is the app:

module Main where

import Network.Wreq
import Control.Lens
import Data.Aeson.Lens
import Control.Monad

main :: IO ()
main = replicateM_ 10 g
  where 
    g = do
        r <- get "http://httpbin.org/get"
        print $ r ^. responseBody
                   . key "headers"
                   . key "User-Agent"
                   . _String

UPDATE 1: Thank everyone for incredible good responses. As was suggested, I add +RTS -s output, so the entire picture builds up for everyone who read it.

#./simple-wreq +RTS -s
...
     128,875,432 bytes allocated in the heap
      32,414,616 bytes copied during GC
       2,394,888 bytes maximum residency (16 sample(s))
         355,192 bytes maximum slop
               7 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       194 colls,     0 par    0.018s   0.022s     0.0001s    0.0022s
  Gen  1        16 colls,     0 par    0.027s   0.031s     0.0019s    0.0042s

UPDATE 2: The size of the executable:

#du -h simple-wreq
63M     simple-wreq

解决方案

A man with a watch knows what time it is. A man with two watches is never sure.

Ah, but what do does two watches show? Are both meant to show the current time in UTC? Or is one of them supposed to show the time in UTC, and the other one the time on a certain point on Mars? As long as they are in sync, the second scenario wouldn't be a problem, right?

And that is exactly what is happening here. You compare different memory measurements:

  • the maximum residency
  • the total amount of allocated memory

The maximum residency is the highest amount of memory your program ever uses at a given time. That's 19MB. However, the total amount of allocated memory is a lot more, since that's how GHC works: it "allocates" memory for objects that are garbage collected, which is almost everything that's not unpacked.

Let us inspect a C example for this:

int main() {
   int i;
   char * mem;

   for(i = 0; i < 5; ++i) {
      mem = malloc(19 * 1000 * 1000);
      free(mem);
   }
   return 0;
}

Whenever we use malloc, we will allocate 19 megabytes of memory. However, we free the memory immediately after. The highest amount of memory we ever have at one point is therefore 19 megabytes (and a little bit more for the stack and the program itself).

However, in total, we allocate 5 * 19M, 95M total. Still, we could run our little program with just 20 megs of RAM fine. That's the difference between total allocated memory and maximum residency. Note that the residency reported by time is always at least du <executable>, since that has to reside in memory too.

That being said, the easiest way to generate statistics is -s, which will show how what was the maximum residency from the Haskell's program point of view. In your case, it will be the 1.9M, the number in your heap profile (or double the amount due to profiling). And yeah, Haskell executables tend to get extremely large, since libraries are statically linked.

这篇关于从GHC分析器中获得意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆