使用 GCC + ARM 减少递归期间的堆栈使用 [英] Reducing stack usage during recursion with GCC + ARM

查看:19
本文介绍了使用 GCC + ARM 减少递归期间的堆栈使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用于嵌入式 ARM 处理器的递归下降解析器(在 C + GCC 中,用于 ARM Cortex M3).

I have a recursive descent parser for an embedded ARM processor (in C + GCC, for ARM Cortex M3).

在运行它时,我注意到它使用了大量的堆栈空间(甚至比您预期的还要多),经过仔细检查,我发现这种情况正在发生:

While running it I've noticed that it uses a massive amount of stack space (even more than you might expect) and under closer inspection I have found that this is happening:

extern int bar(int *p);

int foo() {
 int z = foo(); // it's an example!

 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}

运行结果 arm-none-eabi-gcc -fomit-frame-pointer -S test.c

foo:
    str lr, [sp, #-4]!  ; Push link register
    sub sp, sp, #412    ; Reserve space on stack, even if we don't need it now!
    bl  foo             ; Recurse
    str r0, [sp, #404]  ; Store result
    ...

所以在函数开始时,它将整个栈帧压入栈中.然而,经过几次迭代后,它在堆栈中获得了大量尚未使用的内容.

So at the start of the function, it pushes the entire stack frame onto the stack. However after a few iterations it's got loads of stuff on the stack that it hasn't used yet.

理想情况下,我希望 GCC 生成:

Ideally, what I'd like is for GCC to generate:

foo:
    str lr, [sp, #-4]!  ; Push link register
    ; Don't reserve space, because we don't need it
    bl  foo             ; Recurse
    sub sp, sp, #412    ; Reserve space now
    str r0, [sp, #404]  ; Store result
    ...

(这可能不正确,但我希望你能明白)

(This is probably not correct but I hope you get the idea)

用下面的代码可以实现类似的东西,但它真的很讨厌(如果 GCC 内联 fooworker,它会再次中断!).一定有更好的方法吗?

Something a bit like this can be achieved with the following code, but it's really nasty (and if GCC inlines fooworker, it breaks again!). There must be a better way?

int fooworker(int z) {
 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}


int foo() {
 return fooworker(foo());
}

那么有没有办法告诉 GCC 只在基本块的开头扩大堆栈,或者是否有一个屏障"语句会导致在那个点添加额外的推送/弹出操作?我猜 GCC 正在使用一种 ARM 标准调用类型 - 但是有没有办法用另一种调用类型标记这些函数,这种调用类型对堆栈更有效,或者有没有办法重写函数,使得堆栈是用得更明智一点?

So is there a way of telling GCC to only enlarge the stack at the start of the basic block, or is there a 'barrier' statement that causes extra push/pop ops to be added at that point? I guess GCC is using one of the ARM standard call types - but is there a way to tag these functions with another call type that is a bit more efficient with the stack, or is there a way to rewrite the functions such that the stack is used a bit more sensibly?

请不要告诉我不要使用递归,它没有回答问题.

Please don't tell me not to use recursion, it is not answering the question.

推荐答案

int *n = alloca(sizeof(*n) * 100);

这很丑陋,我个人将函数分成两部分,但似乎在所有优化级别的 amd64 上都可以在我的 gcc 中工作.

It's ugly and I'd personally split up the function into two parts, but seems to work in my gcc on amd64 on all optimization levels.

这篇关于使用 GCC + ARM 减少递归期间的堆栈使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆