用GCC + ARM递归过程中降低堆栈使用 [英] Reducing stack usage during recursion with GCC + ARM

查看:229
本文介绍了用GCC + ARM递归过程中降低堆栈使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个嵌入式ARM处理器(在C​​ + GCC,为的ARM Cortex M3)递归下降解析器。

在运行它,我已经注意到,它使用的堆栈空间(甚至比你所期望的)一个巨大的数额,并在仔细观察我发现,这种情况正在发生:

 的extern INT酒吧为(int * P);INT富(){
 INT Z = foo的(); //这是一个例子! INT N [100]; //堆栈使用
 返回Z +巴(N); //调用杆(N)被优化掉停止ñ
}

运行的结果臂无 - EABI - 海合会-fomit-frame-pointer的-S test.c的

  foo的:
    海峡LR,[SP,#-4​​]! ;推送链接寄存器
    子藻,SP,#412;堆栈预留空间,即使我们并不需要它吧!
    BL foo的;递归
    STR R0,[SP,#404];存储结果
    ...

因此​​,在函数的开始,它推动了整个堆栈帧压入堆栈。但是经过几次反复它有堆栈,它已经没有用在东西负载。

在理想情况下,我想是GCC生成:

  foo的:
    海峡LR,[SP,#-4​​]! ;推送链接寄存器
    ;不预留空间,因为我们不需要它
    BL foo的;递归
    子藻,SP,#412;现在保留空间
    STR R0,[SP,#404];存储结果
    ...

(这可能不是正确的,但我希望你的想法)

东西有点像这样可以用下面的code来实现,但它确实讨厌(而且如果GCC内联fooworker,它再次打破了!)。必须有一个更好的办法?

  INT fooworker(INT Z){
 INT N [100]; //堆栈使用
 返回Z +巴(N); //调用杆(N)被优化掉停止ñ
}
INT富(){
 返回fooworker(富());
}

那么,有没有告诉GCC只在基本块的开始放大栈的方式,还是有导致在这一点上添加额外的PUSH / POP OPS一个障碍说法?我猜GCC是使用ARM标准的呼叫类型之一 - 但有没有办法来标记的是一个比较有效的堆栈,或者是有没有办法改写功能,堆栈是另一种呼叫类型这些功能用多一点理智?<​​/ p>

请不要告诉我不要用递归,它不回答这个问题。


解决方案

 为int * N =的alloca(sizeof的(* N)* 100);

这是丑陋的,我会亲自分裂功能分为两部分,但似乎我的海湾合作​​委员会的工作对AMD64上的所有优化级别。

I have a recursive descent parser for an embedded ARM processor (in C + GCC, for ARM Cortex M3).

While running it I've noticed that it uses a massive amount of stack space (even more than you might expect) and under closer inspection I have found that this is happening:

extern int bar(int *p);

int foo() {
 int z = foo(); // it's an example!

 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}

Result of running arm-none-eabi-gcc -fomit-frame-pointer -S test.c

foo:
    str lr, [sp, #-4]!  ; Push link register
    sub sp, sp, #412    ; Reserve space on stack, even if we don't need it now!
    bl  foo             ; Recurse
    str r0, [sp, #404]  ; Store result
    ...

So at the start of the function, it pushes the entire stack frame onto the stack. However after a few iterations it's got loads of stuff on the stack that it hasn't used yet.

Ideally, what I'd like is for GCC to generate:

foo:
    str lr, [sp, #-4]!  ; Push link register
    ; Don't reserve space, because we don't need it
    bl  foo             ; Recurse
    sub sp, sp, #412    ; Reserve space now
    str r0, [sp, #404]  ; Store result
    ...

(This is probably not correct but I hope you get the idea)

Something a bit like this can be achieved with the following code, but it's really nasty (and if GCC inlines fooworker, it breaks again!). There must be a better way?

int fooworker(int z) {
 int n[100];  // stack usage
 return z+bar(n); // calling bar(n) stops n from being optimised out
}


int foo() {
 return fooworker(foo());
}

So is there a way of telling GCC to only enlarge the stack at the start of the basic block, or is there a 'barrier' statement that causes extra push/pop ops to be added at that point? I guess GCC is using one of the ARM standard call types - but is there a way to tag these functions with another call type that is a bit more efficient with the stack, or is there a way to rewrite the functions such that the stack is used a bit more sensibly?

Please don't tell me not to use recursion, it is not answering the question.

解决方案

int *n = alloca(sizeof(*n) * 100);

It's ugly and I'd personally split up the function into two parts, but seems to work in my gcc on amd64 on all optimization levels.

这篇关于用GCC + ARM递归过程中降低堆栈使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆