使用 GCC 编译器的 ARM 内核的堆栈回溯(当有 MSP 到 PSP 切换时) [英] Stack Backtrace for ARM core using GCC compiler (when there is a MSP to PSP switch)

查看:76
本文介绍了使用 GCC 编译器的 ARM 内核的堆栈回溯(当有 MSP 到 PSP 切换时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

核心 - ARM Cortex-M4

Core - ARM Cortex-M4

编译器 - GCC 5.3.0 ARM EABI

Compiler - GCC 5.3.0 ARM EABI

操作系统 - 免费实时操作系统

OS - Free RTOS

我正在使用 gcc 库函数 _Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn,void*); 进行堆栈回溯;

I am doing stack backtrace using gcc library function _Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn,void*);

在我们的项目中,MSP 堆栈用于异常处理.在其他情况下,使用 PSP 堆栈.当我在异常处理程序中调用 _Unwind_Backtrace() 时,我能够正确地回溯到异常内部调用的第一个函数.在此之前,堆栈是 MSP.

In our project, MSP stack is used for exception handling. In other cases, PSP stack is used. When I call _Unwind_Backtrace() inside the exception handler, I am able to back trace properly up to the first function which is called inside exception. Until this the stack is MSP.

但在异常之前,我们无法回溯.此时使用的堆栈是PSP.

But before exception, we were not able to back trace. At this point, the stack used is PSP.

例如:假设

Task1
{
    func1()
}



func1
{
  func2()
}

func2
{
  an exception occurs here
}

**Inside Exception**
{
  func1ex()
}

func1ex
{
   func2ex()
}



func2ex
{
  unwind backtrace()
}

Unwind backtrace 能够回溯到 func1ex() 但不能回溯路径 task1-->func1-->func2

Unwind backtrace is able to backtrace up to func1ex() but not able to backtrace the path task1-->func1-->func2

由于异常期间 PSP 到 MSP 堆栈之间的切换,它无法回溯正在使用 PSP 的函数.

Because there is a switching between PSP to MSP stack during exception, it is not able to backtrace functions which are using PSP.

在控制进入异常处理程序之前,寄存器 R0、R1、R2、R3、LR、PC 和 XPSR 由内核堆叠在 PSP 中.我可以看到.但是我不知道如何使用这个堆栈帧来为 PSP 做回溯.

Before control comes to exception handler, registers R0, R1, R2, R3, LR, PC and XPSR are stacked in the PSP by the core. I am able to view that. But I don't know how to use this stack frame to do backtrace for PSP.

谁能告诉我们在这种情况下该怎么做,以便我们可以回溯到任务级别?

Could anybody tell what to do in this case such that we can backtrace up to task level?

谢谢,

阿什温.

推荐答案

这是可行的,但需要了解 libgcc 如何实现 _Unwind_Backtrace 函数的内部细节.幸运的是,代码是开源的,但取决于这些内部细节,它可能会在将来的 armgcc 版本中中断,而不会发出任何通知.

This is doable but needs access to internal details of how libgcc implements the _Unwind_Backtrace function. Fortunately the code is open-source, but depending on such internal details is brittle in that it may break in future versions of armgcc without any notice.

一般来说,通过 libgcc 的源代码进行回溯,它会创建 CPU 内核寄存器的内存虚拟表示,然后使用此表示向上遍历堆栈,模拟异常抛出._Unwind_Backtrace 做的第一件事就是从当前 CPU 寄存器中填充这个上下文,然后调用一个内部实现函数.

Generally, reading through the source of libgcc doing the backtrace, it creates an inmemory virtual representation of the CPU core registers, then uses this representation to walk up the stack, simulating exception throws. The first thing that _Unwind_Backtrace does is fill in this context from the current CPU registers, then call an internal implementation function.

在大多数情况下,从堆叠的异常结构手动创建该上下文足以伪造从处理程序模式向上通过调用堆栈的回溯.这是一些示例代码(来自 https:///github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162):

Creating that context manually from the stacked exception structure is sufficient to fake the backtrace going from handler mode upwards through the call stack in most cases. Here is some example code (from https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162):

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
struct core_regs
{
    unsigned r[16];
};

/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
typedef struct
{
    unsigned demand_save_flags;
    struct core_regs core;
} phase2_vrs;

/// We store what we know about the external context at interrupt entry in this
/// structure.
phase2_vrs main_context;
/// Saved value of the lr register at the exception entry.
unsigned saved_lr;

/// Takes registers from the core state and the saved exception context and
/// fills in the structure necessary for the LIBGCC unwinder.
void fill_phase2_vrs(volatile unsigned *fault_args)
{
    main_context.demand_save_flags = 0;
    main_context.core.r[0] = fault_args[0];
    main_context.core.r[1] = fault_args[1];
    main_context.core.r[2] = fault_args[2];
    main_context.core.r[3] = fault_args[3];
    main_context.core.r[12] = fault_args[4];
    // We add +2 here because first thing libgcc does with the lr value is
    // subtract two, presuming that lr points to after a branch
    // instruction. However, exception entry's saved PC can point to the first
    // instruction of a function and we don't want to have the backtrace end up
    // showing the previous function.
    main_context.core.r[14] = fault_args[6] + 2;
    main_context.core.r[15] = fault_args[6];
    saved_lr = fault_args[5];
    main_context.core.r[13] = (unsigned)(fault_args + 8); // stack pointer
}
extern "C"
{
    _Unwind_Reason_Code __gnu_Unwind_Backtrace(
        _Unwind_Trace_Fn trace, void *trace_argument, phase2_vrs *entry_vrs);
}

/// Static variable for trace_func.
void *last_ip;

/// Callback from the unwind backtrace function.
_Unwind_Reason_Code trace_func(struct _Unwind_Context *context, void *arg)
{
    void *ip;
    ip = (void *)_Unwind_GetIP(context);
    if (strace_len == 0)
    {
        // stacktrace[strace_len++] = ip;
        // By taking the beginning of the function for the immediate interrupt
        // we will attempt to coalesce more traces.
        // ip = (void *)_Unwind_GetRegionStart(context);
    }
    else if (last_ip == ip)
    {
        if (strace_len == 1 && saved_lr != _Unwind_GetGR(context, 14))
        {
            _Unwind_SetGR(context, 14, saved_lr);
            allocator.singleLenHack++;
            return _URC_NO_REASON;
        }
        return _URC_END_OF_STACK;
    }
    if (strace_len >= MAX_STRACE - 1)
    {
        ++allocator.limitReached;
        return _URC_END_OF_STACK;
    }
    // stacktrace[strace_len++] = ip;
    last_ip = ip;
    ip = (void *)_Unwind_GetRegionStart(context);
    stacktrace[strace_len++] = ip;
    return _URC_NO_REASON;
}

/// Called from the interrupt handler to take a CPU trace for the current
/// exception.
void take_cpu_trace()
{
    memset(stacktrace, 0, sizeof(stacktrace));
    strace_len = 0;
    last_ip = nullptr;
    phase2_vrs first_context = main_context;
    __gnu_Unwind_Backtrace(&trace_func, 0, &first_context);
    // This is a workaround for the case when the function in which we had the
    // exception trigger does not have a stack saved LR. In this case the
    // backtrace will fail after the first step. We manually append the second
    // step to have at least some idea of what's going on.
    if (strace_len == 1)
    {
        main_context.core.r[14] = saved_lr;
        main_context.core.r[15] = saved_lr;
        __gnu_Unwind_Backtrace(&trace_func, 0, &main_context);
    }
    unsigned h = hash_trace(strace_len, (unsigned *)stacktrace);
    struct trace *t = find_current_trace(h);
    if (!t)
    {
        t = add_new_trace(h);
    }
    if (t)
    {
        t->total_size += 1;
    }
}

/// Change this value to runtime disable and enable the CPU profile gathering
/// code.
bool enable_profiling = 0;

/// Helper function to declare the CPU usage tick interrupt.
/// @param irq_handler_name is the name of the interrupt to declare, for example
/// timer4a_interrupt_handler.
/// @param CLEAR_IRQ_FLAG is a c++ statement or statements in { ... } that will
/// be executed before returning from the interrupt to clear the timer IRQ flag.
#define DEFINE_CPU_PROFILE_INTERRUPT_HANDLER(irq_handler_name, CLEAR_IRQ_FLAG) 
    extern "C"                                                                 
    {                                                                          
        void __attribute__((__noinline__)) load_monitor_interrupt_handler(     
            volatile unsigned *exception_args, unsigned exception_return_code) 
        {                                                                      
            if (enable_profiling)                                              
            {                                                                  
                fill_phase2_vrs(exception_args);                               
                take_cpu_trace();                                              
            }                                                                  
            cpuload_tick(exception_return_code & 4 ? 0 : 255);                 
            CLEAR_IRQ_FLAG;                                                    
        }                                                                      
        void __attribute__((__naked__)) irq_handler_name(void)                 
        {                                                                      
            __asm volatile("mov  r0, %0 
"                                    
                           "str  r4, [r0, 4*4] 
"                             
                           "str  r5, [r0, 5*4] 
"                             
                           "str  r6, [r0, 6*4] 
"                             
                           "str  r7, [r0, 7*4] 
"                             
                           "str  r8, [r0, 8*4] 
"                             
                           "str  r9, [r0, 9*4] 
"                             
                           "str  r10, [r0, 10*4] 
"                           
                           "str  r11, [r0, 11*4] 
"                           
                           "str  r12, [r0, 12*4] 
"                           
                           "str  r13, [r0, 13*4] 
"                           
                           "str  r14, [r0, 14*4] 
"                           
                           :                                                   
                           : "r"(main_context.core.r)                          
                           : "r0");                                            
            __asm volatile(" tst   lr, #4               
"                    
                           " ite   eq                   
"                    
                           " mrseq r0, msp              
"                    
                           " mrsne r0, psp              
"                    
                           " mov r1, lr 
"                                    
                           " ldr r2,  =load_monitor_interrupt_handler  
"     
                           " bx  r2  
"                                       
                           :                                                   
                           :                                                   
                           : "r0", "r1", "r2");                                
        }                                                                      
    }

此代码旨在使用计时器中断获取 CPU 配置文件,但回溯展开可以从任何处理程序(包括故障处理程序)中重复使用.从下往上阅读代码:

This code is designed to take a CPU profile using a timer interrupt, but the backtrace unwinding can be reused from any handler including fault handlers. Read the code from the bottom to the top:

  • IRQ函数必须用__naked__属性定义,否则GCC的函数入口头会以不可预知的方式操纵CPU的状态,例如修改堆栈指针.
  • 首先我们保存所有其他不在异常条目结构中的核心寄存器.我们需要从一开始就从汇编开始执行此操作,因为当它们用作临时寄存器时,它们通常会被以后的 C 代码修改.
  • 然后我们从中断前重建堆栈指针;无论处理器之前处于处理程序还是线程模式,代码都可以工作.这个指针是异常入口结构.此代码不处理非 4 字节对齐的堆栈,但我从未见过 armgcc 这样做.
  • 剩下的代码是C/C++,我们填入我们从libgcc中获取的内部结构,然后调用unwinding过程的内部实现.我们需要进行一些调整来解决 libgcc 的某些假设,这些假设在异常输入时不成立.
  • 在一种特殊情况下,展开不起作用,即异常发生在叶函数中,该叶函数在进入时没有将 LR 保存到堆栈中.当您尝试从进程模式执行回溯时,这永远不会发生,因为被调用的回溯函数将确保调用函数不是叶子.我试图通过在回溯过程本身期间调整 LR 寄存器来应用一些解决方法,但我不相信它每次都有效.我对如何更好地做到这一点的建议感兴趣.
  • It is important that the IRQ function be defined with the attribute __naked__, otherwise the function entry header of GCC will manipulate the state of the CPU in unpredictable way, modifying the stack pointer for example.
  • First thing we save all other core registers that are not in the exception entry struct. We need to do this from assembly right at the beginning, because these will be typically modified by later C code when they are used as temporary registers.
  • Then we reconstruct the stack pointer from before the interrupt; the code will work whether the processor was in handler or thread mode before. This pointer is the exception entry structure. This code does not handle stacks that are not 4-byte aligned, but I never saw armgcc do that anyway.
  • The rest of the code is in C/C++, we fill in the internal structure we took from libgcc, then call the internal implementation of the unwinding process. There are some adjustments we need to make to work around certain assumptions of libgcc that do not hold upon exception entry.
  • There is one specific situation where the unwinding does not work, which is if the exception happened in a leaf function that does not save LR to the stack upon entry. This never happens when you try to do a backtrace from process mode, because the backtrace function being called will ensure that the calling function is not a leaf. I tried to apply some workarounds by adjusting the LR register during the backtracing process itself, but I'm not convinced it works every time. I'm interested in suggestions on how to do this better.

这篇关于使用 GCC 编译器的 ARM 内核的堆栈回溯(当有 MSP 到 PSP 切换时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆