linux perf：如何解释和查找热点 [英] linux perf: how to interpret and find hotspots

查看：1184 发布时间：2016/10/14 22:52:27 c++ linux performance profiling perf

本文介绍了linux perf：如何解释和查找热点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我今天试用了linux的 perf 实用程序，但无法解释其结果。我习惯了valgrind的callgrind，这是一个完全不同的方法，以基于抽样的perf方法。

我做了什么：

perf record -g -p $（pidof someapp） perf report -g -n pre>

现在我看到这样的：

 
 + 16.92％kdevelop libsqlite3 .so.0.8.6 [。] 0x3fe57↑
 + 10.61％kdevelop libQtGui.so.4.7.3 [。] 0x81e344▮
 + 7.09％kdevelop libc-2.14.so [。] 0x85804▒
 + 4.96％kdevelop libQtGui.so.4.7.3 [。] 0x265b69▒
 + 3.50％kdevelop libQtCore.so.4.7.3 [。] 0x18608d▒
 + 2.68％kdevelop libc-2.14 .so [。] memcpy▒
 + 1.15％kdevelop [kernel.kallsyms] [k] copy_user_generic_string▒
 + 0.90％kdevelop libQtGui.so.4.7.3 [。] QTransform :: translate（double， double）▒
 + 0.88％kdevelop libc-2.14.so [。] __libc_malloc▒
 + 0.85％kdevelop libc-2.14.so [。] memcpy 
 ...

好的，这些函数可能很慢，但是如何找到他们从哪里调用？由于所有这些热点位于外部库中，我看不到优化我的代码的方法。

基本上我正在寻找一种用累计成本注释的callgraph，其中我的函数比我调用的库函数具有更高的包含抽样成本。

这是可能与perf吗？如果是这样，怎么办？

注意：我发现E解开了callgraph并给出了一些更多的信息。但是呼号通常不够深，和/或随机终止，而不给出关于在哪里花费了多少信息的信息。示例：

 
  -  10.26％kate libkatepartinterfaces.so.4.6.0 [。] Kate :: TextLoader :: readLine 
 Kate :: TextLoader :: readLine（int&，int&）
 Kate :: TextBuffer :: load（QString const&，bool&，bool&）
 KateBuffer :: openFile（QString const&）
 KateDocument :: openFile（）
 0x7fe37a81121c

这是一个问题，我运行在64位？： http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html （我不使用fedora，但似乎适用于所有64位系统）。

解决方案

最终能够使用DWARF信息生成调用图：

  perf record --call-graph dwarf  -  yourapp 
 perf report -g graph --no-children

整洁，但是curses GUI VTune，KCacheGrind或类似...我建议尝试使用FlameGraphs，这是一个漂亮的可视化： http：// www .brendangregg.com / FlameGraphs / cpuflamegraphs.html

注意：在报告步骤中， -g graph 使结果输出简单地理解相对于总百分比，而不是相对于父数字。 - no-children 将只显示自我成本，而不是包容性成本 - 这个功能我也是无价的。

如果你有一个新的perf和Intel CPU，还要尝试LBR unwinder，它有更好的性能，并产生小得多的结果文件：

  perf record --call-graph lbr  -  yourapp

这里的缺点是与默认的DWARF unwinder配置相比，调用堆栈深度更有限。

I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf.

What I did:

perf record -g -p $(pidof someapp)
perf report -g -n

Now I see something like this:

+     16.92%  kdevelop  libsqlite3.so.0.8.6               [.] 0x3fe57                                                                                                              ↑
+     10.61%  kdevelop  libQtGui.so.4.7.3                 [.] 0x81e344                                                                                                             ▮
+      7.09%  kdevelop  libc-2.14.so                      [.] 0x85804                                                                                                              ▒
+      4.96%  kdevelop  libQtGui.so.4.7.3                 [.] 0x265b69                                                                                                             ▒
+      3.50%  kdevelop  libQtCore.so.4.7.3                [.] 0x18608d                                                                                                             ▒
+      2.68%  kdevelop  libc-2.14.so                      [.] memcpy                                                                                                               ▒
+      1.15%  kdevelop  [kernel.kallsyms]                 [k] copy_user_generic_string                                                                                             ▒
+      0.90%  kdevelop  libQtGui.so.4.7.3                 [.] QTransform::translate(double, double)                                                                                ▒
+      0.88%  kdevelop  libc-2.14.so                      [.] __libc_malloc                                                                                                        ▒
+      0.85%  kdevelop  libc-2.14.so                      [.] memcpy 
...

Ok, these functions might be slow, but how do I find out where they are getting called from? As all these hotspots lie in external libraries I see no way to optimize my code.

Basically I am looking for some kind of callgraph annotated with accumulated cost, where my functions have a higher inclusive sampling cost than the library functions I call.

Is this possible with perf? If so - how?

Note: I found out that "E" unwraps the callgraph and gives somewhat more information. But the callgraph is often not deep enough and/or terminates randomly without giving information about how much info was spent where. Example:

-     10.26%  kate  libkatepartinterfaces.so.4.6.0  [.] Kate::TextLoader::readLine(int&...
     Kate::TextLoader::readLine(int&, int&)                                            
     Kate::TextBuffer::load(QString const&, bool&, bool&)                              
     KateBuffer::openFile(QString const&)                                              
     KateDocument::openFile()                                                          
     0x7fe37a81121c

Could it be an issue that I'm running on 64 bit? See also: http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html (I'm not using fedora but seems to apply to all 64bit systems).

解决方案

With Linux 3.7 perf is finally able to use DWARF information to generate the callgraph:

perf record --call-graph dwarf -- yourapp
perf report -g graph --no-children

Neat, but the curses GUI is horrible compared to VTune, KCacheGrind or similar... I recommend to try out FlameGraphs instead, which is a pretty neat visualization: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Note: In the report step, -g graph makes the results output simple to understand "relative to total" percentages, rather than "relative to parent" numbers. --no-children will show only self cost, rather than inclusive cost - a feature that I also find invaluable.

If you have a new perf and Intel CPU, also try out the LBR unwinder, which has much better performance and produces far smaller result files:

perf record --call-graph lbr -- yourapp

The downside here is that the call stack depth is more limited compared to the default DWARF unwinder configuration.

这篇关于linux perf：如何解释和查找热点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

linux perf：如何解释和查找热点 [英] linux perf: how to interpret and find hotspots

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

linux perf：如何解释和查找热点 [英] linux perf: how to interpret and find hotspots

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭