linux的perf实用程序如何理解堆栈跟踪? [英] How does linux's perf utility understand stack traces?

查看:230
本文介绍了linux的perf实用程序如何理解堆栈跟踪?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Brendan Gregg著名地使用Linux的perf实用程序来生成c/c ++,jvm代码,nodejs代码等的火焰图.

Linux's perf utility is famously used by Brendan Gregg to generate flamegraphs for c/c++, jvm code, nodejs code, etc.

Linux内核本身可以理解堆栈跟踪吗?即使在使用完全不同的语言编写程序时,我在哪里可以了解更多有关该工具如何反思进程的堆栈痕迹的信息?

Does the Linux kernel natively understand stack traces? Where can I read more about how a tool is able to introspect into stack traces of processes, even if processes are written in completely different languages?

推荐答案

Gregg在perf中提供了有关堆栈跟踪的简短介绍: http://www.brendangregg.com/perf.html

There is short introduction about stack traces in perf by Gregg: http://www.brendangregg.com/perf.html

4.4堆栈跟踪

4.4 Stack Traces

始终使用框架指针进行编译.省略帧指针是一种有害的编译器优化,会破坏调试器,可悲的是,它通常是默认设置.没有它们,您可能会从perf_events看到不完整的堆栈...有两种方法可以解决此问题:使用矮数据解开堆栈,或返回帧指针.

Always compile with frame pointers. Omitting frame pointers is an evil compiler optimization that breaks debuggers, and sadly, is often the default. Without them, you may see incomplete stacks from perf_events ... There are two ways to fix this: either using dwarf data to unwind the stack, or returning the frame pointers.

矮人

从3.9内核开始,perf_events支持在用户级堆栈中缺少帧指针的变通方法:libunwind,它使用dwarf.可以使用"-g dwarf"启用. ...编译器优化(-O2),在这种情况下,省略了帧指针. ...使用-fno-omit-frame-pointer重新编译..

Since about the 3.9 kernel, perf_events has supported a workaround for missing frame pointers in user-level stacks: libunwind, which uses dwarf. This can be enabled using "-g dwarf". ... compiler optimizations (-O2), which in this case has omitted the frame pointer. ... recompiling .. with -fno-omit-frame-pointer:

非C风格的语言可能具有不同的框架格式,或者也可能省略框架指针:

Non C-style languages may have different frame format, or may omit frame pointers too:

4.3. JIT符号(Java,Node.js)

4.3. JIT Symbols (Java, Node.js)

具有虚拟机(VM)的程序(例如Java的JVM和节点的v8)执行自己的虚拟处理器,该处理器具有执行功能和管理堆栈的方式.如果使用perf_events对它们进行概要分析,则会看到VM引擎的符号..perf_events具有JIT支持来解决此问题,这要求VM维护/tmp/perf-PID.map文件以进行符号转换.

Programs that have virtual machines (VMs), like Java's JVM and node's v8, execute their own virtual processor, which has its own way of executing functions and managing stacks. If you profile these using perf_events, you'll see symbols for the VM engine .. perf_events has JIT support to solve this, which requires the VM to maintain a /tmp/perf-PID.map file for symbol translation.

请注意,由于x86上的热点忽略了帧指针(就像gcc一样),因此Java可能不会显示完整的堆栈.在较新版本(JDK 8u60 +)上,可以使用-XX:+PreserveFramePointer选项来解决此问题,...

Note that Java may not show full stacks to begin with, due to hotspot on x86 omitting the frame pointer (just like gcc). On newer versions (JDK 8u60+), you can use the -XX:+PreserveFramePointer option to fix this behavior, ...

Gregg关于Java和堆栈跟踪的博客文章: http://techblog.netflix.com/2015/07/java-in-flames.html (固定帧指针"-在某些JDK8版本和JDK9中通过在程序启动时添加选项进行了修复)

The Gregg's blog post about Java and stack traces: http://techblog.netflix.com/2015/07/java-in-flames.html ("Fixing Frame Pointers" - fixed in some JDK8 versions and in JDK9 by adding option on program start)

现在,您的问题:

Linux的perf实用程序如何理解堆栈跟踪?

How does linux's perf utility understand stack traces?

perf 实用程序基本上(在早期版本中)只是解析从Linux内核子系统"perf_events"(有时是"events")返回的数据,可以通过syscall

perf utility basically (in early versions) just parses data returned from linux kernel's subsystem "perf_events" (or sometimes "events"), accessed with syscall perf_event_open. For call stack trace there are options PERF_SAMPLE_CALLCHAIN / PERF_SAMPLE_STACK_USER:

sample_type PERF_SAMPLE_CALLCHAIN 记录调用链(堆栈回溯).

sample_type PERF_SAMPLE_CALLCHAIN Records the callchain (stack backtrace).

          PERF_SAMPLE_STACK_USER (since Linux 3.7)
                 Records the user level stack, allowing stack unwinding.

Linux内核本身可以理解堆栈跟踪吗?

Does the Linux kernel natively understand stack traces?

它可能理解(如果已实现),但可能不能理解,这取决于您的cpu体系结构.采样(从实时进程中获取/读取调用栈)调用链的功能在内核的体系结构独立部分中定义为__weak,其主体为空:

It may understand (if implemented) and may not, depending on your cpu architecture. The function of sampling (getting/reading call stack from live process) callchain is defined in architecture-independent part of kernel as __weak with empty body:

http://lxr.free-electrons.com/source /kernel/events/callchain.c?v=4.4#L26

 27 __weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
 28                                   struct pt_regs *regs)
 29 {
 30 }
 31 
 32 __weak void perf_callchain_user(struct perf_callchain_entry *entry,
 33                                 struct pt_regs *regs)
 34 {
 35 }

在4.4内核中,针对x86/x86_64,ARC,SPARC,ARM/ARM64,Xtensa,Tilera TILE,PowerPC,Imagination Meta的与体系结构相关的部分重新定义了内核用户空间调用链采样器:

In 4.4 kernel user-space callchain sampler is redefined in architecture-dependent part of kernel for x86/x86_64, ARC, SPARC, ARM/ARM64, Xtensa, Tilera TILE, PowerPC, Imagination Meta:

http://lxr.free-electrons.com/ident?v=4.4;i = perf_callchain_user

arch/x86/kernel/cpu/perf_event.c, line 2279
arch/arc/kernel/perf_event.c, line 72
arch/sparc/kernel/perf_event.c, line 1829
arch/arm/kernel/perf_callchain.c, line 62
arch/xtensa/kernel/perf_event.c, line 339
arch/tile/kernel/perf_event.c, line 995
arch/arm64/kernel/perf_callchain.c, line 109
arch/powerpc/perf/callchain.c, line 490
arch/metag/kernel/perf_callchain.c, line 59

对于某些体系结构和/或某些模式,从用户堆栈中读取调用链可能并不简单.

Reading of call chain from user stack may be not trivial for some architectures and/or for some modes.

您使用哪种CPU体系结构?使用什么语言和VM?

What CPU architecture you use? What languages and VM are used?

在哪儿,即使使用完全不同的语言编写工具,我也可以在其中更多地了解工具如何对进程的堆栈痕迹进行反省?

Where can I read more about how a tool is able to introspect into stack traces of processes, even if processes are written in completely different languages?

您可以尝试使用gdb和/或调试器来查找该语言或 backtrace函数或libunwind中对只读展开的支持(在libunwind中有本地回溯示例show_backtrace()).

You may try gdb and/or debuggers for the language or backtrace function of libc or support of read-only unwinding in libunwind (there is local backtrace example in libunwind, show_backtrace()).

他们可能会更好地支持框架解析/与语言的虚拟机或展开信息更好地集成.如果gdb(使用backtrace命令)或其他调试器无法从正在运行的程序中获取堆栈跟踪,则可能根本无法获取堆栈跟踪.

They may have better support of frame parsing / better integration with virtual machine of the language or with unwind info. If gdb (with backtrace command) or other debuggers can't get stack traces from running program, there may be no way of getting stack trace at all.

如果他们可以获取呼叫跟踪,但perf不能(即使在C/C ++中使用-fno-omit-frame-pointer重新编译后),则可能会在perf_events中添加对架构+帧格式的这种组合的支持和perf.

If they can get call trace, but perf can't (even after recompiling with -fno-omit-frame-pointer for C/C++), it may be possible to add support of such combination of architecture + frame format into perf_events and perf.

有几个博客,其中包含有关通用回溯问题和解决方案的一些信息:

There are several blogs with some info about generic backtracing problems and solutions:

  • http://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/ - local backtrace with libunwind
  • http://codingrelic.geekhold.com/2009/05/pre-mortem-backtracing.html gcc's __builtin_return_address(N) vs glibc's backtrace() vs libunwind's local backtrace
  • http://lucumr.pocoo.org/2014/10/30/dont-panic/ backtrace and unwinding in rust
  • https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues same problem of backtracing in gperftools software-timer based profiler library

perf_events/perf的矮人支持:

  • https://lwn.net/Articles/499116/ [RFCv4 00/16] perf: Add backtrace post dwarf unwind, may 2012
  • https://lwn.net/Articles/507753/ [PATCHv7 00/17] perf: Add backtrace post dwarf unwind, Jul 2012
  • https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding - Dwarf unwinding on ARM 7/8 for perf
  • https://wiki.linaro.org/KenWerner/Sandbox/libunwind#libunwind_ARM_unwind_methods - non-dwarf methods too

这篇关于linux的perf实用程序如何理解堆栈跟踪?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆