从GHC分析器中获得意义 [英] Making sense from GHC profiler
问题描述
werq
和 lens-aeson
库,并且在了解GHC分析时,决定玩一下。 使用不同的选项( time
工具, + RTS -p -RTS
和 + RTS -p -h
)我获得了完全不同的内存使用数。拥有所有这些数字,我现在完全失去了试图了解正在发生的事情以及应用程序实际使用的内存量。
这种情况让我想起了这句话: Arthur Bloch:有手表的人知道现在是几点钟,有两只手表的男人永远都不会确定。
请问你能否告诉我,我可以阅读所有这些数字,以及它们各自的意义。
以下是数字:
time -l </ code>报告 19M
#/ usr / bin / time -l ./simple-wreq
...
3.02 real 0.39用户0.17 sys
19070976最大驻留集大小
0平均共享内存大小
0平均非共享数据大小
0平均非共享堆栈大小
21040页面回收
0页错误
0交换
0块输入操作
0块输出操作
发送71条消息
收到71条消息
收到信号2991
43自愿上下文切换
6490非自愿上下文切换
使用 + RTS -p -RTS
在 92M 周围标记报告。尽管它说总分配,但对我来说似乎很奇怪,像这样的简单应用程序可以分配和释放91M
# ./simple-wreq + RTS -p -RTS
#cat simple-wreq.prof
Fri Oct 14 15:08 2016时间和分配分析报告(最终)
简单-wreq + RTS -N -p -RTS
总时间= 0.07秒(69个滴答@ 1000 us,1个处理器)
总分配= 91,905,888字节(不包括分析开销)
成本中心模块%时间%分配
main.g主60.9 88.8
主要24.6 2.5
decodeLenient /外观Data.ByteString.Base64Internal 5.8 2.6
decodeLenientWithTable / fill Data.ByteString.Base64.Internal 2.9 0.1
decodeLenientWithTable.\.\\ \\.fill Data.ByteString.Base64.Internal 1.4 0.0
decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1
decodeLenientWithTable.\.\.fill .\\\\.\ Data.ByteString.Base64.Internal 1.4 3.3
decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4
个体继承
成本中心模块编号。项目%时间%分配%时间%分配
主要主要443 0 24.6 2.5 100.0 100.0
主要主要887 0 0.0 0.0 75.4 97.4
主要g主要889 0 60.9 88.8 75.4 97.4
object_Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2
jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2
unstream / resize Data.Text.Internal。 Fusion 923 600 0.0 0.3 0.0 0.3
decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1
decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7
....
+ RTS -p -h
和 hp2ps
向我显示以下图片和两个数字:标题中的 114K 以及图上的 1.8Mb 。
为了以防万一,这里是应用程序:
模块Main其中
导入Network.Wreq
导入Control.Lens
导入Data.Aeson.Lens
import Control.Monad
$ b $ main :: IO()
main = replicateM_ 10 g
其中
g = do
r< - gethttp:/ /httpbin.org/get
print $ r ^。 responseBody
。键标题
。键用户代理
。 _String
更新1:感谢所有人提供令人难以置信的良好回应。正如所建议的那样,我添加了 + RTS -s
输出,因此为阅读它的每个人提供了整个图片。
#。/ simple-wreq + RTS -s
...
在堆中分配的128,875,432个字节
GC $ b $中复制的32,414,616个字节b最大居民身份数为2,394,888字节(16个样本)
355,192字节最大值
使用中的总内存(由于分段造成0 MB丢失)
总计时间)平均暂停最大暂停
Gen 0 194 colls,0 par 0.018s 0.022s 0.0001s 0.0022s
Gen 1 16 colls,0 par 0.027s 0.031s 0.0019s 0.0042s
更新2 :可执行文件的大小:
#du -h simple-wreq
63M simple-wreq
有腕表的男人知道现在是几点钟。一个有两只手表的男人永远不会确定。
啊,但是两只手表显示的是什么?都意味着以UTC显示当前时间?或者是他们中的一个应该以UTC显示时间,另一个是火星上某个点的时间?只要它们同步,第二种情况不会是一个问题,对吧?
而这正是这里发生的情况。您可以比较不同的记忆体衡量指标: b
$ b
最高居民位置是您的程序在内存中使用的最高内存量给定的时间。那是19MB。然而,分配的内存总量更多,因为这就是GHC的工作方式:它为垃圾收集的对象分配内存,这几乎是所有未解包的内容。
让我们看一个C示例:
int main(){
int i;
char * mem; (i = 0; i <5; ++ i){
mem = malloc(19 * 1000 * 1000);
。
free(mem);
}
返回0;
$ / code>
每当我们使用 malloc
,我们将分配19兆字节的内存。但是,我们之后立即释放内存。因此,我们曾经拥有过的最高内存量是19兆字节(而对于堆栈和程序本身而言,还有一点点)。我们分配了5 * 19M,<95M的总数。尽管如此,我们仍然可以用20兆的RAM运行我们的小程序。这是总分配内存和最高驻留之间的差异。请注意,按时间报告的驻留时间总是至少 du< executable>
,因为它也必须驻留在内存中。
$ b $据说,生成统计信息的最简单方法是
-s
,这将从Haskell的程序角度显示最大限度的居住权。在你的情况下,它将是 1.9M ,你的堆配置文件中的数字(或由于配置文件而增加一倍)。是的,Haskell可执行文件往往会变得非常大,因为库是静态链接的。 I'm trying to make sense from GHC profiler. There is a rather simple app, which uses werq
and lens-aeson
libraries, and while learning about GHC profiling, I decided to play with it a bit.
Using different options (time
tool, +RTS -p -RTS
and +RTS -p -h
) I acquired entirely different numbers of my memory usage. Having all those numbers, I'm now completely lost trying to understand what is going on, and how much memory the app actually uses.
This situation reminds me the phrase by Arthur Bloch: "A man with a watch knows what time it is. A man with two watches is never sure."
Can you, please, suggest me, how I can read all those numbers, and what is the meaning of each of them.
Here are the numbers:
time -l
reports around 19M
#/usr/bin/time -l ./simple-wreq
...
3.02 real 0.39 user 0.17 sys
19070976 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
21040 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
71 messages sent
71 messages received
2991 signals received
43 voluntary context switches
6490 involuntary context switches
Using +RTS -p -RTS
flag reports around 92M. Although it says "total alloc" it seems strange to me, that a simple app like this one can allocate and release 91M
# ./simple-wreq +RTS -p -RTS
# cat simple-wreq.prof
Fri Oct 14 15:08 2016 Time and Allocation Profiling Report (Final)
simple-wreq +RTS -N -p -RTS
total time = 0.07 secs (69 ticks @ 1000 us, 1 processor)
total alloc = 91,905,888 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
main.g Main 60.9 88.8
MAIN MAIN 24.6 2.5
decodeLenient/look Data.ByteString.Base64.Internal 5.8 2.6
decodeLenientWithTable/fill Data.ByteString.Base64.Internal 2.9 0.1
decodeLenientWithTable.\.\.fill Data.ByteString.Base64.Internal 1.4 0.0
decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1
decodeLenientWithTable.\.\.fill.\.\.\.\ Data.ByteString.Base64.Internal 1.4 3.3
decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 443 0 24.6 2.5 100.0 100.0
main Main 887 0 0.0 0.0 75.4 97.4
main.g Main 889 0 60.9 88.8 75.4 97.4
object_ Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2
jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2
unstream/resize Data.Text.Internal.Fusion 923 600 0.0 0.3 0.0 0.3
decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1
decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7
....
+RTS -p -h
and hp2ps
show me the following picture and two numbers: 114K in the header and something around 1.8Mb on the graph.
And, just in case, here is the app:
module Main where
import Network.Wreq
import Control.Lens
import Data.Aeson.Lens
import Control.Monad
main :: IO ()
main = replicateM_ 10 g
where
g = do
r <- get "http://httpbin.org/get"
print $ r ^. responseBody
. key "headers"
. key "User-Agent"
. _String
UPDATE 1: Thank everyone for incredible good responses. As was suggested, I add +RTS -s
output, so the entire picture builds up for everyone who read it.
#./simple-wreq +RTS -s
...
128,875,432 bytes allocated in the heap
32,414,616 bytes copied during GC
2,394,888 bytes maximum residency (16 sample(s))
355,192 bytes maximum slop
7 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 194 colls, 0 par 0.018s 0.022s 0.0001s 0.0022s
Gen 1 16 colls, 0 par 0.027s 0.031s 0.0019s 0.0042s
UPDATE 2: The size of the executable:
#du -h simple-wreq
63M simple-wreq
A man with a watch knows what time it is. A man with two watches is never sure.
Ah, but what do does two watches show? Are both meant to show the current time in UTC? Or is one of them supposed to show the time in UTC, and the other one the time on a certain point on Mars? As long as they are in sync, the second scenario wouldn't be a problem, right?
And that is exactly what is happening here. You compare different memory measurements:
- the maximum residency
- the total amount of allocated memory
The maximum residency is the highest amount of memory your program ever uses at a given time. That's 19MB. However, the total amount of allocated memory is a lot more, since that's how GHC works: it "allocates" memory for objects that are garbage collected, which is almost everything that's not unpacked.
Let us inspect a C example for this:
int main() {
int i;
char * mem;
for(i = 0; i < 5; ++i) {
mem = malloc(19 * 1000 * 1000);
free(mem);
}
return 0;
}
Whenever we use malloc
, we will allocate 19 megabytes of memory. However, we free the memory immediately after. The highest amount of memory we ever have at one point is therefore 19 megabytes (and a little bit more for the stack and the program itself).
However, in total, we allocate 5 * 19M, 95M total. Still, we could run our little program with just 20 megs of RAM fine. That's the difference between total allocated memory and maximum residency. Note that the residency reported by time is always at least du <executable>
, since that has to reside in memory too.
That being said, the easiest way to generate statistics is -s
, which will show how what was the maximum residency from the Haskell's program point of view. In your case, it will be the 1.9M, the number in your heap profile (or double the amount due to profiling). And yeah, Haskell executables tend to get extremely large, since libraries are statically linked.
这篇关于从GHC分析器中获得意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!