linux perf:如何解释和查找热点 [英] linux perf: how to interpret and find hotspots

查看:1184
本文介绍了linux perf:如何解释和查找热点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我今天试用了linux的 perf 实用程序,但无法解释其结果。我习惯了valgrind的callgrind,这是一个完全不同的方法,以基于抽样的perf方法。



我做了什么:

  perf record -g -p $(pidof someapp)
perf report -g -n
pre>

现在我看到这样的:

 
+ 16.92%kdevelop libsqlite3 .so.0.8.6 [。] 0x3fe57↑
+ 10.61%kdevelop libQtGui.so.4.7.3 [。] 0x81e344▮
+ 7.09%kdevelop libc-2.14.so [。] 0x85804▒
+ 4.96%kdevelop libQtGui.so.4.7.3 [。] 0x265b69▒
+ 3.50%kdevelop libQtCore.so.4.7.3 [。] 0x18608d▒
+ 2.68%kdevelop libc-2.14 .so [。] memcpy▒
+ 1.15%kdevelop [kernel.kallsyms] [k] copy_user_generic_string▒
+ 0.90%kdevelop libQtGui.so.4.7.3 [。] QTransform :: translate(double, double)▒
+ 0.88%kdevelop libc-2.14.so [。] __libc_malloc▒
+ 0.85%kdevelop libc-2.14.so [。] memcpy
...

好的,这些函数可能很慢,但是如何找到他们从哪里调用?由于所有这些热点位于外部库中,我看不到优化我的代码的方法。



基本上我正在寻找一种用累计成本注释的callgraph,其中我的函数比我调用的库函数具有更高的包含抽样成本。



这是可能与perf吗?如果是这样,怎么办?



注意:我发现E解开了callgraph并给出了一些更多的信息。但是呼号通常不够深,和/或随机终止,而不给出关于在哪里花费了多少信息的信息。示例:

 
- 10.26%kate libkatepartinterfaces.so.4.6.0 [。] Kate :: TextLoader :: readLine
Kate :: TextLoader :: readLine(int&,int&)
Kate :: TextBuffer :: load(QString const&,bool&,bool&)
KateBuffer :: openFile(QString const&)
KateDocument :: openFile()
0x7fe37a81121c

这是一个问题,我运行在64位? : http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html (我不使用fedora,但似乎适用于所有64位系统)。

解决方案

最终能够使用DWARF信息生成调用图:

  perf record --call-graph dwarf  -  yourapp 
perf report -g graph --no-children

整洁,但是curses GUI VTune,KCacheGrind或类似...我建议尝试使用FlameGraphs,这是一个漂亮的可视化: http:// www .brendangregg.com / FlameGraphs / cpuflamegraphs.html



注意:在报告步骤中, -g graph 使结果输出简单地理解相对于总百分比,而不是相对于父数字。 - no-children 将只显示自我成本,而不是包容性成本 - 这个功能我也是无价的。



如果你有一个新的perf和Intel CPU,还要尝试LBR unwinder,它有更好的性能,并产生小得多的结果文件:

  perf record --call-graph lbr  -  yourapp 

这里的缺点是与默认的DWARF unwinder配置相比,调用堆栈深度更有限。


I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf.

What I did:

perf record -g -p $(pidof someapp)
perf report -g -n

Now I see something like this:

+     16.92%  kdevelop  libsqlite3.so.0.8.6               [.] 0x3fe57                                                                                                              ↑
+     10.61%  kdevelop  libQtGui.so.4.7.3                 [.] 0x81e344                                                                                                             ▮
+      7.09%  kdevelop  libc-2.14.so                      [.] 0x85804                                                                                                              ▒
+      4.96%  kdevelop  libQtGui.so.4.7.3                 [.] 0x265b69                                                                                                             ▒
+      3.50%  kdevelop  libQtCore.so.4.7.3                [.] 0x18608d                                                                                                             ▒
+      2.68%  kdevelop  libc-2.14.so                      [.] memcpy                                                                                                               ▒
+      1.15%  kdevelop  [kernel.kallsyms]                 [k] copy_user_generic_string                                                                                             ▒
+      0.90%  kdevelop  libQtGui.so.4.7.3                 [.] QTransform::translate(double, double)                                                                                ▒
+      0.88%  kdevelop  libc-2.14.so                      [.] __libc_malloc                                                                                                        ▒
+      0.85%  kdevelop  libc-2.14.so                      [.] memcpy 
...

Ok, these functions might be slow, but how do I find out where they are getting called from? As all these hotspots lie in external libraries I see no way to optimize my code.

Basically I am looking for some kind of callgraph annotated with accumulated cost, where my functions have a higher inclusive sampling cost than the library functions I call.

Is this possible with perf? If so - how?

Note: I found out that "E" unwraps the callgraph and gives somewhat more information. But the callgraph is often not deep enough and/or terminates randomly without giving information about how much info was spent where. Example:

-     10.26%  kate  libkatepartinterfaces.so.4.6.0  [.] Kate::TextLoader::readLine(int&...
     Kate::TextLoader::readLine(int&, int&)                                            
     Kate::TextBuffer::load(QString const&, bool&, bool&)                              
     KateBuffer::openFile(QString const&)                                              
     KateDocument::openFile()                                                          
     0x7fe37a81121c

Could it be an issue that I'm running on 64 bit? See also: http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html (I'm not using fedora but seems to apply to all 64bit systems).

解决方案

With Linux 3.7 perf is finally able to use DWARF information to generate the callgraph:

perf record --call-graph dwarf -- yourapp
perf report -g graph --no-children

Neat, but the curses GUI is horrible compared to VTune, KCacheGrind or similar... I recommend to try out FlameGraphs instead, which is a pretty neat visualization: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Note: In the report step, -g graph makes the results output simple to understand "relative to total" percentages, rather than "relative to parent" numbers. --no-children will show only self cost, rather than inclusive cost - a feature that I also find invaluable.

If you have a new perf and Intel CPU, also try out the LBR unwinder, which has much better performance and produces far smaller result files:

perf record --call-graph lbr -- yourapp

The downside here is that the call stack depth is more limited compared to the default DWARF unwinder configuration.

这篇关于linux perf:如何解释和查找热点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆