ARM GCC内联汇编优化问题 [英] ARM gcc inline assembler optimization problem

查看:235
本文介绍了ARM GCC内联汇编优化问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么我的内联汇编程序不工作时,我有-O3级优化,但它与其他优化标志作品(-O0,-O1,-O2,-Os)?

我甚至加挥发到我所有的汇编指令,我以为会告诉编译器不要触摸或重新排序什么?

此致

Gigu先生


解决方案

GCC内联汇编是朝着正确的规格非常敏感。

在特别的你必须非常$有关指定正确的约束,以确保编译器不会决定以优化你的汇编$ C $中c p $ pcise。有几件事情需要注意的。举一个例子。

以下两种:

 INT myasmfunc(INT ARG)/ *绝对马车... * /
    {
        INT注册ASM设为myVal(R2)= ARG;        ASM(ADD R1,R0,#22 \\ N:::R1);
        ASM(补充说:R0,R1,R0 \\ N:::R0,CC);
        ASM(subeq R2,#123 \\ N:::R2);
        ASM(subne R2,#213 \\ N:::R2);
        返回设为myVal;
    }

 INT myasmfunc(INT ARG)
    {
        INT =设为myVal阿根廷,再加= ARG;        ASM(添加%0,#22 \\ n \\ t:+ R(加));
        ASM(补充说:%1%2 \\ n \\ t的
             subeq%0,#123 \\ n \\ t的
             subne%0,#213 \\ n \\ t:+ R(设为myVal),+ R(加):R(阿根廷):CC);
        返回设为myVal;
    }

看起来乍一看相似,你会天真地以为他们做同样的;但他们都非常远从!

有这个code的第一个版本多个问题。


  1. 对于一个,如果你将它指定为独立的 ASM()语句,编译器是免费的插入的任意code IN-之间。这尤其意味着的说明,即使他们自己不的修改的条件codeS,可降至事物的编译犯规选择要插入的一样。

  2. 二,又由于指令的拆分指定单独的 ASM()报表时,也不能保证在code发生器将选择相同的寄存器放设为myVal 在这两个时期, ASM(R2)规格变量声明这一规定。

  3. 三,第一是 R0 的假设包含了函数的说法是错误的;编译器,通过它获取到组装块的时候,可能已经选择去动这个说法到任何其他地方。更糟糕的,因为即使你再有分裂声明,并不能保证被制成会发生什么的的两个 ASM()。即使您指定 __ asm__ __volatile __(...); 编译器将两个的这种块作为独立的实体

  4. 第四,你不告诉你正在重挫/指定编译器设为myVal 。它可能已经选择到别处暂时将它,因为你弄错R2和返回时,决定把它从恢复...(???)。

只是为了好玩,这是第一个函数的输出,有以下四种情况:


  1. 默认 - GCC -c tst.c

  2. 优化 - GCC -o8 -c tst.c

  3. 使用一些不寻常的选择 - GCC -c -finstrument函数tst.c

  4. 那加优化 - GCC -c -o8 -finstrument函数tst.c

 .text段拆卸:00000000:
   0:e52db004推{}计划生育; (STR FP,[SP,#-4​​]!)
   4:e28db000添加FP,SP,#0;为0x0
   8:e24dd00c分SP,SP,#12;位于0xC
   C:e50b0008 STR R0,[FP,#-8]
  10:e51b2008 LDR R2,[FP,#-8]
  14:e2811016 ADD R1,R1,#22; 0x16
  18:e0910000添加R0,R1,r0的
  1C:0242207b subeq R2,R2,#123; 0x7b
  20:124220d5 subne R2,R2,#213; 0xd5
  24:e1a03002 MOV R3,R2
  28:e1a00003 MOV R0,R3
  2C:e28bd000加SP,FP#0;为0x0
  30:e8bd0800弹出{} FP
  34:e12fff1e BX LR
.text段拆卸:00000000:
   0:e1a03000 MOV R3,R0
   4:e2811016 ADD R1,R1,#22; 0x16
   8:e0910000增加了R0,R1,R0
   C:0242207b subeq R2,R2,#123; 0x7b
  10:124220d5 subne R2,R2,#213; 0xd5
  14:e1a00003 MOV R0,R3
  18:e12fff1e BX LR
.text段拆卸:00000000:
   0:e92d4830推{R4,R5,FP,LR}
   4:e28db00c添加FP,SP,#12;位于0xC
   8:e24dd008分SP,SP,#8; 0x8中
   C:e1a0500e MOV R5,LR
  10:e50b0010 STR R0,[FP,#-16]
  14:e59f0038 LDR R0,[PC,#56]; 54
  18:e1a01005 MOV R1,R5
  1C:ebfffffe BL 0
  20:e51b2010 LDR R2,[FP,#-16]
  24:e2811016 ADD R1,R1,#22; 0x16
  28:e0910000添加R0,R1,r0的
  图2c:0242207b subeq R2,R2,#123; 0x7b
  30:124220d5 subne R2,R2,#213; 0xd5
  34:e1a04002 MOV R4,R2
  38:e59f0014 LDR R0,[PC,#20]; 54
  图3c:e1a01005 MOV R1,R5
  40:ebfffffe BL 0
  44:e1a03004 MOV R3,R4
  48:e1a00003 MOV R0,R3
  4C:e24bd00c分SP,FP#12;位于0xC
  50:e8bd8830 POP {R4,R5,FP,PC}
  54:00000000 .word 00000000
.text段拆卸:00000000:
   0:e92d4070推{R4,R5,R6,LR}
   4:e1a0100e MOV R1,LR
   8:e1a05000 MOV R5,R0
   C:e59f0028 LDR R0,[PC,#40]; 3C
  10:e1a0400e MOV R4,LR
  14:ebfffffe BL 0
  18:e2811016 ADD R1,R1,#22; 0x16
  1C:e0910000增加R0,R1,R0
  20:0242207b subeq R2,R2,#123; 0x7b
  24:124220d5 subne R2,R2,#213; 0xd5
  28:e59f000c LDR R0,[PC,#12]; 3C
  2C:e1a01004 MOV R1,R4
  30:ebfffffe BL 0
  34:e1a00005 MOV R0,R5
  38:e8bd8070 POP {R4,R5,R6,PC}
  3C:00000000 .word 00000000

正如你所看到的,的这些都不的做了什么你会希望能看到;第二个版本的code,虽然,在 GCC -c -o8 ... 最终为:

 .text段拆卸:00000000:
   0:e1a03000 MOV R3,R0
   4:e2833016增加R3,R3,#22; 0x16
   8:e0933000增加R3,R3,R0
   C:0240007b subeq R0,R0,#123; 0x7b
  10:124000d5 subne R0,R0,#213; 0xd5
  14:e12fff1e BX LR

和那就是相当密切,你在汇编中指定什么,你期待什么。

士气:要明确和详细的与你的限制,你的操作任务,并保持的相同 ASM()块中装配的相互依存线(做多行语句)。

Why is it that my inline assembler routine is not working when I have optimization flag -O3 but it works with other optimization flags (-O0, -O1, -O2, -Os)?

I even added volatile to all my assembler instructions, which I thought would tell the compiler to not touch or reorder anything?

Best Regards

Mr Gigu

解决方案

GCC inline assembler is very sensitive towards correct specification.

In particular, you have to be extremely precise about specifying the correct constraints to make sure the compiler does not decide to "optimize" your assembler code. There's a few things to watch out for. Take an example.

The following two:

    int myasmfunc(int arg)    /* definitely buggy ... */
    {
        register int myval asm("r2") = arg;

        asm ("add r1, r0, #22\n" ::: "r1");
        asm ("adds r0, r1, r0\n" ::: "r0", "cc");
        asm ("subeq r2, #123\n" ::: "r2");
        asm ("subne r2, #213\n" ::: "r2");
        return myval;
    }

and

    int myasmfunc(int arg)
    {
        int myval = arg, plus = arg;

        asm ("add %0, #22\n\t" : "+r"(plus));
        asm ("adds %1, %2\n\t"
             "subeq %0, #123\n\t"
             "subne %0, #213\n\t" : "+r"(myval), "+r"(plus) : "r"(arg) : "cc");
        return myval;
    }

might look similar at first sight and you'd naively assume they do the same; but they are very far from that !

There are multiple problems with the first version of this code.

  1. For one, if you specify it as separate asm() statements, the compiler is free to insert arbitrary code in-between. That in particular means the sub instructions, even though they themselves don't modify the condition codes, can fall foul of things the compiler choose to insert which did.
  2. Second, again due to the split of the instructions when specifying separate asm() statements, there's no guarantee the code generator will choose the same register to put myval in both times, the asm("r2") spec in the variable declaration notwithstanding.
  3. Third, the assumption made in the first that r0 contains the argument of the function is wrong; the compiler, by the time it gets to the assembly block, might've choosen to move this argument to whatever other place. Worse even since again you have the split statement, and no guarantee is made as to what happens between two asm(). Even if you specify __asm__ __volatile__(...); the compiler treats two such blocks as independent entities.
  4. Fourth, you're not telling the compiler that you're clobbering / assigning myval. It might've chosen to temporarily move it elsewhere because you're clobbering "r2" and when returning, decide to restore it from ... (???).

Just for the fun of it, here's the output of the first function, for the following four cases:

  1. default - gcc -c tst.c
  2. optimized - gcc -O8 -c tst.c
  3. using some unusual options - gcc -c -finstrument-functions tst.c
  4. that plus optimization - gcc -c -O8 -finstrument-functions tst.c

Disassembly of section .text:

00000000 :
   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e28db000    add fp, sp, #0  ; 0x0
   8:   e24dd00c    sub sp, sp, #12 ; 0xc
   c:   e50b0008    str r0, [fp, #-8]
  10:   e51b2008    ldr r2, [fp, #-8]
  14:   e2811016    add r1, r1, #22 ; 0x16
  18:   e0910000    adds    r0, r1, r0
  1c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  20:   124220d5    subne   r2, r2, #213    ; 0xd5
  24:   e1a03002    mov r3, r2
  28:   e1a00003    mov r0, r3
  2c:   e28bd000    add sp, fp, #0  ; 0x0
  30:   e8bd0800    pop {fp}
  34:   e12fff1e    bx  lr


Disassembly of section .text:

00000000 :
   0:   e1a03000    mov r3, r0
   4:   e2811016    add r1, r1, #22 ; 0x16
   8:   e0910000    adds    r0, r1, r0
   c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  10:   124220d5    subne   r2, r2, #213    ; 0xd5
  14:   e1a00003    mov r0, r3
  18:   e12fff1e    bx  lr


Disassembly of section .text:

00000000 :
   0:   e92d4830    push    {r4, r5, fp, lr}
   4:   e28db00c    add fp, sp, #12 ; 0xc
   8:   e24dd008    sub sp, sp, #8  ; 0x8
   c:   e1a0500e    mov r5, lr
  10:   e50b0010    str r0, [fp, #-16]
  14:   e59f0038    ldr r0, [pc, #56]   ; 54 
  18:   e1a01005    mov r1, r5
  1c:   ebfffffe    bl  0 
  20:   e51b2010    ldr r2, [fp, #-16]
  24:   e2811016    add r1, r1, #22 ; 0x16
  28:   e0910000    adds    r0, r1, r0
  2c:   0242207b    subeq   r2, r2, #123    ; 0x7b
  30:   124220d5    subne   r2, r2, #213    ; 0xd5
  34:   e1a04002    mov r4, r2
  38:   e59f0014    ldr r0, [pc, #20]   ; 54 
  3c:   e1a01005    mov r1, r5
  40:   ebfffffe    bl  0 
  44:   e1a03004    mov r3, r4
  48:   e1a00003    mov r0, r3
  4c:   e24bd00c    sub sp, fp, #12 ; 0xc
  50:   e8bd8830    pop {r4, r5, fp, pc}
  54:   00000000    .word   0x00000000


Disassembly of section .text:

00000000 :
   0:   e92d4070    push    {r4, r5, r6, lr}
   4:   e1a0100e    mov r1, lr
   8:   e1a05000    mov r5, r0
   c:   e59f0028    ldr r0, [pc, #40]   ; 3c 
  10:   e1a0400e    mov r4, lr
  14:   ebfffffe    bl  0 
  18:   e2811016    add r1, r1, #22 ; 0x16
  1c:   e0910000    adds    r0, r1, r0
  20:   0242207b    subeq   r2, r2, #123    ; 0x7b
  24:   124220d5    subne   r2, r2, #213    ; 0xd5
  28:   e59f000c    ldr r0, [pc, #12]   ; 3c 
  2c:   e1a01004    mov r1, r4
  30:   ebfffffe    bl  0 
  34:   e1a00005    mov r0, r5
  38:   e8bd8070    pop {r4, r5, r6, pc}
  3c:   00000000    .word   0x00000000

As you can see, neither of these does what you'd be hoping to see; the second version of the code, though, on gcc -c -O8 ... ends up as:

Disassembly of section .text:

00000000 :
   0:   e1a03000    mov r3, r0
   4:   e2833016    add r3, r3, #22 ; 0x16
   8:   e0933000    adds    r3, r3, r0
   c:   0240007b    subeq   r0, r0, #123    ; 0x7b
  10:   124000d5    subne   r0, r0, #213    ; 0xd5
  14:   e12fff1e    bx  lr

and that is, rather closely, what you've specified in your assembly and what you're expecting.

Morale: Be explicit and exact with your constraints, your operand assignments, and keep interdependent lines of assembly within the same asm() block (make a multiline statement).

这篇关于ARM GCC内联汇编优化问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆