使用GCC编译器的ARM内核的堆栈回溯(当有MSP到PSP开关时) [英] Stack Backtrace for ARM core using GCC compiler (when there is a MSP to PSP switch)

查看:280
本文介绍了使用GCC编译器的ARM内核的堆栈回溯(当有MSP到PSP开关时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

核心-ARM Cortex-M4

Core - ARM Cortex-M4

编译器-GCC 5.3.0 ARM EABI

Compiler - GCC 5.3.0 ARM EABI

OS-免费的RTOS

OS - Free RTOS

我正在使用gcc库函数_Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn,void *);

I am doing stack backtrace using gcc library function _Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn,void*);

在我们的项目中,MSP堆栈用于异常处理.在其他情况下,使用PSP堆栈.当我在异常处理程序中调用_Unwind_Backtrace()时,我能够正确地追溯到第一个被称为内部异常的函数.在此之前,堆栈是MSP.

In our project, MSP stack is used for exception handling. In other cases, PSP stack is used. When I call _Unwind_Backtrace() inside the exception handler, I am able to back trace properly up to the first function which is called inside exception. Until this the stack is MSP.

但是在例外之前,我们无法追溯.此时,使用的堆栈是PSP.

But before exception, we were not able to back trace. At this point, the stack used is PSP.

例如:假设

Task1
{
    func1()
}



func1
{
  func2()
}

func2
{
  an exception occurs here
}

**Inside Exception**
{
  func1ex()
}

func1ex
{
   func2ex()
}



func2ex
{
  unwind backtrace()
}

展开回溯功能可以回溯到func1ex(),但不能回溯路径task1-> func1-> func2

Unwind backtrace is able to backtrace up to func1ex() but not able to backtrace the path task1-->func1-->func2

由于异常期间PSP与MSP堆栈之间会有切换,因此无法回溯使用PSP的功能.

Because there is a switching between PSP to MSP stack during exception, it is not able to backtrace functions which are using PSP.

在对异常处理程序进行控制之前,内核将寄存器R0,R1,R2,R3,LR,PC和XPSR堆叠在PSP中.我能够看到.但是我不知道如何使用此堆栈帧对PSP进行回溯.

Before control comes to exception handler, registers R0, R1, R2, R3, LR, PC and XPSR are stacked in the PSP by the core. I am able to view that. But I don't know how to use this stack frame to do backtrace for PSP.

有人可以告诉在这种情况下该怎么做,以便我们可以追溯到任务级别吗?

Could anybody tell what to do in this case such that we can backtrace up to task level?

谢谢

阿什温.

推荐答案

这是可行的,但需要访问libgcc如何实现_Unwind_Backtrace函数的内部细节.幸运的是,该代码是开源的,但是依赖于此类内部细节是脆弱的,因为它可能会在将来的armgcc版本中中断,而不会发出任何通知.

This is doable but needs access to internal details of how libgcc implements the _Unwind_Backtrace function. Fortunately the code is open-source, but depending on such internal details is brittle in that it may break in future versions of armgcc without any notice.

通常,通过回溯libgcc的源进行回溯,它会创建CPU核心寄存器的内存虚拟表示形式,然后使用该表示形式遍历堆栈,从而模拟异常抛出. _Unwind_Backtrace要做的第一件事是从当前CPU寄存器填充此上下文,然后调用内部实现函数.

Generally, reading through the source of libgcc doing the backtrace, it creates an inmemory virtual representation of the CPU core registers, then uses this representation to walk up the stack, simulating exception throws. The first thing that _Unwind_Backtrace does is fill in this context from the current CPU registers, then call an internal implementation function.

在大多数情况下,从堆叠的异常结构手动创建该上下文足以伪造从处理程序模式到调用堆栈的回溯.这是一些示例代码(来自 https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162 ):

Creating that context manually from the stacked exception structure is sufficient to fake the backtrace going from handler mode upwards through the call stack in most cases. Here is some example code (from https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162):

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
struct core_regs
{
    unsigned r[16];
};

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
typedef struct
{
    unsigned demand_save_flags;
    struct core_regs core;
} phase2_vrs;

/// We store what we know about the external context at interrupt entry in this
/// structure.
phase2_vrs main_context;
/// Saved value of the lr register at the exception entry.
unsigned saved_lr;

/// Takes registers from the core state and the saved exception context and
/// fills in the structure necessary for the LIBGCC unwinder.
void fill_phase2_vrs(volatile unsigned *fault_args)
{
    main_context.demand_save_flags = 0;
    main_context.core.r[0] = fault_args[0];
    main_context.core.r[1] = fault_args[1];
    main_context.core.r[2] = fault_args[2];
    main_context.core.r[3] = fault_args[3];
    main_context.core.r[12] = fault_args[4];
    // We add +2 here because first thing libgcc does with the lr value is
    // subtract two, presuming that lr points to after a branch
    // instruction. However, exception entry's saved PC can point to the first
    // instruction of a function and we don't want to have the backtrace end up
    // showing the previous function.
    main_context.core.r[14] = fault_args[6] + 2;
    main_context.core.r[15] = fault_args[6];
    saved_lr = fault_args[5];
    main_context.core.r[13] = (unsigned)(fault_args + 8); // stack pointer
}
extern "C"
{
    _Unwind_Reason_Code __gnu_Unwind_Backtrace(
        _Unwind_Trace_Fn trace, void *trace_argument, phase2_vrs *entry_vrs);
}

/// Static variable for trace_func.
void *last_ip;

/// Callback from the unwind backtrace function.
_Unwind_Reason_Code trace_func(struct _Unwind_Context *context, void *arg)
{
    void *ip;
    ip = (void *)_Unwind_GetIP(context);
    if (strace_len == 0)
    {
        // stacktrace[strace_len++] = ip;
        // By taking the beginning of the function for the immediate interrupt
        // we will attempt to coalesce more traces.
        // ip = (void *)_Unwind_GetRegionStart(context);
    }
    else if (last_ip == ip)
    {
        if (strace_len == 1 && saved_lr != _Unwind_GetGR(context, 14))
        {
            _Unwind_SetGR(context, 14, saved_lr);
            allocator.singleLenHack++;
            return _URC_NO_REASON;
        }
        return _URC_END_OF_STACK;
    }
    if (strace_len >= MAX_STRACE - 1)
    {
        ++allocator.limitReached;
        return _URC_END_OF_STACK;
    }
    // stacktrace[strace_len++] = ip;
    last_ip = ip;
    ip = (void *)_Unwind_GetRegionStart(context);
    stacktrace[strace_len++] = ip;
    return _URC_NO_REASON;
}

/// Called from the interrupt handler to take a CPU trace for the current
/// exception.
void take_cpu_trace()
{
    memset(stacktrace, 0, sizeof(stacktrace));
    strace_len = 0;
    last_ip = nullptr;
    phase2_vrs first_context = main_context;
    __gnu_Unwind_Backtrace(&trace_func, 0, &first_context);
    // This is a workaround for the case when the function in which we had the
    // exception trigger does not have a stack saved LR. In this case the
    // backtrace will fail after the first step. We manually append the second
    // step to have at least some idea of what's going on.
    if (strace_len == 1)
    {
        main_context.core.r[14] = saved_lr;
        main_context.core.r[15] = saved_lr;
        __gnu_Unwind_Backtrace(&trace_func, 0, &main_context);
    }
    unsigned h = hash_trace(strace_len, (unsigned *)stacktrace);
    struct trace *t = find_current_trace(h);
    if (!t)
    {
        t = add_new_trace(h);
    }
    if (t)
    {
        t->total_size += 1;
    }
}

/// Change this value to runtime disable and enable the CPU profile gathering
/// code.
bool enable_profiling = 0;

/// Helper function to declare the CPU usage tick interrupt.
/// @param irq_handler_name is the name of the interrupt to declare, for example
/// timer4a_interrupt_handler.
/// @param CLEAR_IRQ_FLAG is a c++ statement or statements in { ... } that will
/// be executed before returning from the interrupt to clear the timer IRQ flag.
#define DEFINE_CPU_PROFILE_INTERRUPT_HANDLER(irq_handler_name, CLEAR_IRQ_FLAG) \
    extern "C"                                                                 \
    {                                                                          \
        void __attribute__((__noinline__)) load_monitor_interrupt_handler(     \
            volatile unsigned *exception_args, unsigned exception_return_code) \
        {                                                                      \
            if (enable_profiling)                                              \
            {                                                                  \
                fill_phase2_vrs(exception_args);                               \
                take_cpu_trace();                                              \
            }                                                                  \
            cpuload_tick(exception_return_code & 4 ? 0 : 255);                 \
            CLEAR_IRQ_FLAG;                                                    \
        }                                                                      \
        void __attribute__((__naked__)) irq_handler_name(void)                 \
        {                                                                      \
            __asm volatile("mov  r0, %0 \n"                                    \
                           "str  r4, [r0, 4*4] \n"                             \
                           "str  r5, [r0, 5*4] \n"                             \
                           "str  r6, [r0, 6*4] \n"                             \
                           "str  r7, [r0, 7*4] \n"                             \
                           "str  r8, [r0, 8*4] \n"                             \
                           "str  r9, [r0, 9*4] \n"                             \
                           "str  r10, [r0, 10*4] \n"                           \
                           "str  r11, [r0, 11*4] \n"                           \
                           "str  r12, [r0, 12*4] \n"                           \
                           "str  r13, [r0, 13*4] \n"                           \
                           "str  r14, [r0, 14*4] \n"                           \
                           :                                                   \
                           : "r"(main_context.core.r)                          \
                           : "r0");                                            \
            __asm volatile(" tst   lr, #4               \n"                    \
                           " ite   eq                   \n"                    \
                           " mrseq r0, msp              \n"                    \
                           " mrsne r0, psp              \n"                    \
                           " mov r1, lr \n"                                    \
                           " ldr r2,  =load_monitor_interrupt_handler  \n"     \
                           " bx  r2  \n"                                       \
                           :                                                   \
                           :                                                   \
                           : "r0", "r1", "r2");                                \
        }                                                                      \
    }

此代码旨在使用计时器中断获取CPU配置文件,但可以从任何处理程序(包括故障处理程序)重新使用回溯展开.从下至上阅读代码:

This code is designed to take a CPU profile using a timer interrupt, but the backtrace unwinding can be reused from any handler including fault handlers. Read the code from the bottom to the top:

  • 使用属性__naked__定义IRQ函数非常重要,否则GCC的函数入口标头将以不可预测的方式操纵CPU的状态,例如,修改堆栈指针.
  • 首先,我们保存不在异常条目结构中的所有其他核心寄存器.我们需要从一开始就从汇编开始执行此操作,因为当它们用作临时寄存器时,通常会在以后的C代码中对其进行修改.
  • 然后,我们从中断之前重新构建堆栈指针;无论处理器之前处于处理程序模式还是线程模式,代码都将起作用.该指针是异常入口结构.这段代码无法处理未对齐4字节的堆栈,但我从未见过armgcc这样做.
  • 其余代码在C/C ++中,我们填充从libgcc中获取的内部结构,然后调用展开过程的内部实现.我们需要做出一些调整以解决某些libgcc假设,这些假设在异常输入时不成立.
  • 在一种特定情况下,展开不起作用,这是如果叶子函数中发生异常,该异常在进入时不将LR保存到堆栈中.当您尝试从流程模式进行回溯时,这永远不会发生,因为被调用的回溯函数将确保调用函数不是叶子.我试图通过在回溯过程本身中调整LR寄存器来应用一些变通办法,但是我不相信它每次都能起作用.我对如何做得更好的建议很感兴趣.
  • It is important that the IRQ function be defined with the attribute __naked__, otherwise the function entry header of GCC will manipulate the state of the CPU in unpredictable way, modifying the stack pointer for example.
  • First thing we save all other core registers that are not in the exception entry struct. We need to do this from assembly right at the beginning, because these will be typically modified by later C code when they are used as temporary registers.
  • Then we reconstruct the stack pointer from before the interrupt; the code will work whether the processor was in handler or thread mode before. This pointer is the exception entry structure. This code does not handle stacks that are not 4-byte aligned, but I never saw armgcc do that anyway.
  • The rest of the code is in C/C++, we fill in the internal structure we took from libgcc, then call the internal implementation of the unwinding process. There are some adjustments we need to make to work around certain assumptions of libgcc that do not hold upon exception entry.
  • There is one specific situation where the unwinding does not work, which is if the exception happened in a leaf function that does not save LR to the stack upon entry. This never happens when you try to do a backtrace from process mode, because the backtrace function being called will ensure that the calling function is not a leaf. I tried to apply some workarounds by adjusting the LR register during the backtracing process itself, but I'm not convinced it works every time. I'm interested in suggestions on how to do this better.

这篇关于使用GCC编译器的ARM内核的堆栈回溯(当有MSP到PSP开关时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆