如何在GCC内联汇编中使用标签? [英] How Do I Use Labels In GCC Inline Assembly?

查看:244
本文介绍了如何在GCC内联汇编中使用标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图学习x86-64内联汇编,并决定实现这个非常简单的交换方法,它简单地命令 a b code>按升序排列:

  #include  

void swap(int * a,int * b)
{
asm(。intel_syntax noprefix);
asm(mov eax,DWORD PTR [rdi]);
asm(mov ebx,DWORD PTR [rsi]);
asm(cmp eax,ebx);
asm(jle .L1);
asm(mov DWORD PTR [rdi],ebx);
asm(mov DWORD PTR [rsi],eax);
asm(。L1:);
asm(。att_syntax noprefix);
}

int main()
{
int input [3];

scanf(%d%d%d,& input [0],& input [1],& input [2]);

swap(& input [0],& input [1]);
swap(& input [1],& input [2]);
swap(& input [0],& input [1]);

printf(%d%d%d \ n,输入[0],输入[1],输入[2]);

返回0;

$ / code>

上面的代码在我用这个命令运行时按预期工作:

 > gcc main.c 
> ./a.out
> 3 2 1
> 1 2 3

但是,只要我打开优化,我会收到以下错误消息: p>

 > gcc -O2 main.c 
> main.c:汇编程序消息:
> main.c:12:错误:符号`.L1'已经被定义
> main.c:12:错误:符号`.L1'已经被定义
> main.c:12:错误:符号`.L1'已经定义

如果我理解了它是正确的,这是因为 gcc 会在开启优化时尝试内联我的 swap 函数,导致标签 .L1 要在程序集文件中多次定义。



我试着找到这个问题的答案,但似乎没有任何工作。在这个以前问过的问题中,建议使用本地标签,而我尝试过这也是:

  #include  

void swap(int * a,int * b)
{
asm(。intel_syntax noprefix);
asm(mov eax,DWORD PTR [rdi]);
asm(mov ebx,DWORD PTR [rsi]);
asm(cmp eax,ebx);
asm(jle 1f);
asm(mov DWORD PTR [rdi],ebx);
asm(mov DWORD PTR [rsi],eax);
asm(1:);
asm(。att_syntax noprefix);
}

但是,当试图运行程序时,我现在得到了一个分段错误:

 > gcc -O2 main.c 
> ./a.out
> 3 2 1
>细分错误

我也尝试了建议的解决方案此旧问题,并将名称 .L1 更改为 CustomLabel1 ,以防万一会发生名称冲突,但它仍然给我旧的错误:

  > gcc -O2 main.c 
> main.c:汇编器消息:
> main.c:12:错误:符号`CustomLabel1'已经定义
> main.c:12:错误:符号`CustomLabel1'已经定义
> main.c:12:Error:符号`CustomLabel1'已经被定义

最后我还试了这个建议

 void swap(int * a,int * b)
{
asm(。intel_syntax noprefix);
asm(mov eax,DWORD PTR [rdi]);
asm(mov ebx,DWORD PTR [rsi]);
asm(cmp eax,ebx);
asm(jle label%=);
asm(mov DWORD PTR [rdi],ebx);
asm(mov DWORD PTR [rsi],eax);
asm(label%=:);
asm(。att_syntax noprefix);
}

但是我得到这些错误:

  main.c:汇编程序消息:
main.c:9:错误:操作数1中的无效字符'='
main.c :12:错误:助记符
中无效字符'%'main.c:9:错误:操作数1中的无效字符'='
main.c:12:错误:无效字符'%'in助记符
main.c:9:错误:操作数1中的无效字符'='
main.c:12:错误:助记符中的无效字符'%'
main.c:9:错误:操作数1中的无效字符'='
main.c:12:错误:助记符中的字符'%'无效

所以,我的问题是:

如何在内联程序集中使用标签?






这是优化版本的反汇编输出:

 > gcc -O2 -S main.c 

.filemain.c
.section .text.unlikely,ax,@ progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl swap
.type swap,@function
swap:
.LFB23:
.cfi_startproc
#APP
#5main.c1
.intel_syntax noprefix
#02
#6main.c 1
mov eax,DWORD PTR [rdi]
#02
#7main.c1
mov ebx,DWORD PTR
#02
#8main.c1
cmp eax,ebx
#02
#9main.c1
jle 1f
#02
#10main.c1
mov DWORD PTR [rdi],ebx
#02
# 11main.c1
mov DWORD PTR [rsi],eax
#02
#12main.c1
1:
#02
#13main.c1
.att_syntax noprefix
#02
#NO_APP
ret
.cfi_endproc
.LFE23:
.size swap,。-swap
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.section .rodata.str1.1,aMS,@ progbits,1
.LC1:
.string%d%d%d
.LC2:
.string%d%d%d \\\

.section .text.unlikely
.LCOLDB3:
.section .text.startup,ax ,@ progbits
.LHOTB3:
.p2align 4,,15
.globl main
.type main,@function
main:
.LFB24 :
.cfi_startproc
subq $ 40,%rsp
.cfi_def_cfa_offset 48
movl $ .LC1,%edi
movq%fs:40,%rax
movq%rax,24(%rsp)
xorl%eax,%eax
leaq 8(%rsp),%rcx
leaq 4(%rsp),%rdx
movq %rsp,%rsi
call __isoc99_scanf
#APP
#5main.c1
.intel_syntax noprefix
#02
# 6main.c1
mov eax,DWORD PTR [rdi]
#02
#7main.c1
mov ebx,DWORD PTR [ rsi]
#02
#8main.c1
cmp eax,ebx
#02
#9main.c 1
jle 1f
# 02
#10main.c1
mov DWORD PTR [rdi],ebx
#02
#11main.c1
mov DWORD PTR [rsi],eax
#02
#12main.c1
1:
#02
#13main.c1
.att_syntax noprefix
#02
#5main.c1
.intel_syntax noprefix
#0 2
#6main.c1
mov eax,DWORD PTR [rdi]
#02
#7main.c1
mov ebx,DWORD PTR [b]
#02
#8main.c1
cmp eax,ebx
#02
#9main.c1
jle 1f
#02
#10main.c1
mov DWORD PTR [rdi],ebx
#02
#11main.c1
mov DWORD PTR [rsi],eax
#02
#12main。 c1
1:
#02
#13main.c1
.att_syntax noprefix
#02
#5main.c1
.intel_syntax noprefix
#02
#6main.c1
mov eax,DWORD PTR [rdi]
#02
#7main.c1
mov ebx,DWORD PTR [r si]
#02
#8main.c1
cmp eax,ebx
#02
#9main.c 1
jle 1f
#02
#10main.c1
mov DWORD PTR [rdi],ebx
#02
#11main.c1
mov DWORD PTR [rsi],eax
#02
#12main.c1
1 :
#02
#13main.c1
.att_syntax noprefix
#02
#NO_APP
movl 8 (%rsp),%r8d
movl 4(%rsp),%ecx
movl $ .LC2,%esi
movl(%rsp),%edx
xorl%eax ,%eax
movl $ 1,%edi
call __printf_chk
movq 24(%rsp),%rsi
xorq%fs:40,%rsi
jne .L6
xorl%eax,%eax
addq $ 40,%rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L6:
.cfi_restore_state
call __stack_chk_fail
.cfi_endproc
.LFE24:
.size main,.- main
.section .text.unlikely
.LCOLDE3:
.sec .gcc:(Ubuntu 5.4.0-6ubuntu1〜16.04.4)5.4.0 20160609
.section.note.GNU-stack.text.startup
.LHOTE3:
.ident ,,@ progbits


解决方案

- 包括这一个(可能是我所知道的最好的),以及一些关于操作数大小修饰符



这是第一个实现 - swap_2

  void swap_2(int * a,int * b)
{
int tmp0,tmp1;

__asm__ volatile(
movl(%0),%k2\\\
\t/ *%2(tmp0)=(* a)* /
movl(%1),%k3\\\
\t/ *%3(tmp1)=(* b)* /
cmpl%k3,%k2\\\
\ t$ b $ (%<%3)(at& t!)* /
movl%k3,(%0)\\\\' \\ t
movl%k2,(%1)\\\
\t
%=:\\\
\t

:+ r (a),+ r(b),= r(tmp0),= r(tmp1):
:memory/ *cc* /);

一些笔记





下面是C实现 - swap_1

  void swap_1(int * a,int * b)
{
if(* a> * b)
{
int t = * a; * a = * b; * b = t;




$ b编译时使用 gcc -O2 对于x86-64 ELF,我得到相同的代码。编译器选择 tmp0 tmp1 来使用相同的空闲寄存器来存储临时数据(例如.cfi指令等)给出:
$ b $ pre $ swap
movl(%rdi) ,%eax
movl(%rsi),%edx
cmpl%edx,%eax
jle 21f
movl%edx,(%rdi)
movl%eax ,(%rsi)
21:
ret

如上所述,除了编译器为其跳转标签选择 .L1 外,code> swap_1 代码是相同的。使用 -m32 编译代码生成相同的代码(除了以不同的顺序使用tmp寄存器)。由于IA32 ELF ABI在栈上传递参数,而x86-64 ABI传递%rdi %中的前两个参数, rsi






处理(a)(b)作为输入 - swap_3

  void swap_3(int * a,int * b)
{
int tmp0,tmp1;
$ b $ __asm__ volatile(
mov(%[a]),%[x] \\\
\t/ * x =(* a)* /
mov(%[b]),%[y] \\\
\t/ * y =(* b)* /
cmp%[y],%[x] \\\
\t
jle%= f\\\
\t/ * if(x <= y)(at& t!)* /
mov%[y],(%[a ])\\\
\t
mov%[x],(%[b])\\\
\t
%=:\\\
\t

:[x]=& r(tmp0),[y]=& r(tmp1)
:[a] r(b):memory/ *cc* /);
}

我已经取消了'l'后缀和'k'修饰符在这里,因为他们不需要。我还使用'符号名'语法来表示操作数,因为它通常有助于提高代码的可读性。



(a)(b)现在确实是仅输入寄存器。那么=& r语法的含义是什么? & 表示一个早期的clobber 操作数。在这种情况下,可以在完成使用输入操作数之前将值写入,因此编译器必须选择与为输入操作数选择的寄存器不同的寄存器。



再次,编译器生成与 swap_1 swap_2 相同的代码。 p>




我写这个答案的方式比我计划的要多,但正如你所看到的,要保持所有人的意识是非常困难的编译器必须知道的信息,以及每个指令集(ISA)和ABI的特性。

I'm trying to learn x86-64 inline assembly and decided to implement this very simple swap method that simply orders a and b in ascending order:

#include <stdio.h>

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    .L1");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm(".L1:");
    asm(".att_syntax noprefix");
}

int main()
{
    int input[3];

    scanf("%d%d%d", &input[0], &input[1], &input[2]);

    swap(&input[0], &input[1]);
    swap(&input[1], &input[2]);
    swap(&input[0], &input[1]);

    printf("%d %d %d\n", input[0], input[1], input[2]);

    return 0;
}

The above code works as expected when I run it with this command:

> gcc main.c
> ./a.out
> 3 2 1
> 1 2 3

However, as soon as I turn optimazation on I get the following error messages:

> gcc -O2 main.c
> main.c: Assembler messages:
> main.c:12: Error: symbol `.L1' is already defined
> main.c:12: Error: symbol `.L1' is already defined
> main.c:12: Error: symbol `.L1' is already defined

If I've understood it correctly, this is because gcc tries to inline my swap function when optimization is turned on, causing the label .L1 to be defined multiple times in the assembly file.

I've tried to find an answer to this problem, but nothing seems to work. In this previusly asked question it's suggested to use local labels instead, and I've tried that aswell:

#include <stdio.h>

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    1f");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm("1:");
    asm(".att_syntax noprefix");
}

But when trying to run the program I now get a segmentation fault instead:

> gcc -O2 main.c
> ./a.out
> 3 2 1
> Segmentation fault

I also tried the suggested solution to this previusly asked question and changed the name .L1 to CustomLabel1 in case there would be a name collision, but it still gives me the old error:

> gcc -O2 main.c
> main.c: Assembler messages:
> main.c:12: Error: symbol `CustomLabel1' is already defined
> main.c:12: Error: symbol `CustomLabel1' is already defined
> main.c:12: Error: symbol `CustomLabel1' is already defined

Finally I also tried this suggestion:

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    label%=");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm("label%=:");
    asm(".att_syntax noprefix");
}

But then I get these errors instead:

main.c: Assembler messages:
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic

So, my question is:

How can I use labels in inline assembly?


This is the disassemble output for the optimized version:

> gcc -O2 -S main.c

    .file   "main.c"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB0:
    .text
.LHOTB0:
    .p2align 4,,15
    .globl  swap
    .type   swap, @function
swap:
.LFB23:
    .cfi_startproc
#APP
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
#NO_APP
    ret
    .cfi_endproc
.LFE23:
    .size   swap, .-swap
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC1:
    .string "%d%d%d"
.LC2:
    .string "%d %d %d\n"
    .section    .text.unlikely
.LCOLDB3:
    .section    .text.startup,"ax",@progbits
.LHOTB3:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB24:
    .cfi_startproc
    subq    $40, %rsp
    .cfi_def_cfa_offset 48
    movl    $.LC1, %edi
    movq    %fs:40, %rax
    movq    %rax, 24(%rsp)
    xorl    %eax, %eax
    leaq    8(%rsp), %rcx
    leaq    4(%rsp), %rdx
    movq    %rsp, %rsi
    call    __isoc99_scanf
#APP
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
#NO_APP
    movl    8(%rsp), %r8d
    movl    4(%rsp), %ecx
    movl    $.LC2, %esi
    movl    (%rsp), %edx
    xorl    %eax, %eax
    movl    $1, %edi
    call    __printf_chk
    movq    24(%rsp), %rsi
    xorq    %fs:40, %rsi
    jne .L6
    xorl    %eax, %eax
    addq    $40, %rsp
    .cfi_remember_state
    .cfi_def_cfa_offset 8
    ret
.L6:
    .cfi_restore_state
    call    __stack_chk_fail
    .cfi_endproc
.LFE24:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE3:
    .section    .text.startup
.LHOTE3:
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

解决方案

There are plenty of tutorials - including this one (probably the best I know of), and some info on operand size modifiers.

Here's the first implementation - swap_2 :

void swap_2 (int *a, int *b)
{
    int tmp0, tmp1;

    __asm__ volatile (
        "movl (%0), %k2\n\t" /* %2 (tmp0) = (*a) */
        "movl (%1), %k3\n\t" /* %3 (tmp1) = (*b) */
        "cmpl %k3, %k2\n\t"
        "jle  %=f\n\t"       /* if (%2 <= %3) (at&t!) */
        "movl %k3, (%0)\n\t"
        "movl %k2, (%1)\n\t"
        "%=:\n\t"

        : "+r" (a), "+r" (b), "=r" (tmp0), "=r" (tmp1) :
        : "memory" /* "cc" */ );
}

A few notes:

  • volatile (or __volatile__) is required, as the compiler only 'sees' (a) and (b) (and doesn't 'know' you're potentially exchanging their contents), and would otherwise be free to optimize the whole asm statement away - tmp0 and tmp1 would otherwise be considered unused variables too.

  • "+r" means that this is both an input and output that may be modified; only it isn't in this case, and they could strictly be input only - more on that in a bit...

  • The 'l' suffix on 'movl' isn't really necessary; neither is the 'k' (32-bit) length modifier for the registers. Since you're using the Linux (ELF) ABI, an int is 32 bits for both IA32 and x86-64 ABIs.

  • The %= token generates a unique label for us. BTW, the jump syntax <label>f means a forward jump, and <label>b means back.

  • For correctness, we need "memory" as the compiler has no way of knowing if values from dereferenced pointers have been changed. This may be an issue in more complex inline asm surrounded by C code, as it invalidates all currently held values in memory - and is often a sledgehammer approach. Appearing at the end of a function in this fashion, it's not going to be an issue - but you can read more on it here (see: Clobbers)

  • The "cc" flags register clobber is detailed in the same section. on x86, it does nothing. Some writers include it for clarity, but since practically all non-trivial asm statements affect the flags register, it's just assumed to be clobbered by default.

Here's the C implementation - swap_1 :

void swap_1 (int *a, int *b)
{
    if (*a > *b)
    {
        int t = *a; *a = *b; *b = t;
    }
}

Compiling with gcc -O2 for x86-64 ELF, I get identical code. Just a bit of luck that the compiler chose tmp0 and tmp1 to use the same free registers for temps... cutting out the noise, like the .cfi directives, etc., gives:

swap_2:
        movl (%rdi), %eax
        movl (%rsi), %edx
        cmpl %edx, %eax
        jle  21f
        movl %edx, (%rdi)
        movl %eax, (%rsi)
        21:
        ret

As stated, the swap_1 code was identical, except that the compiler chose .L1 for its jump label. Compiling the code with -m32 generated the same code (apart from using the tmp registers in a different order). There's more overhead, as the IA32 ELF ABI passes parameters on the stack, whereas the x86-64 ABI passes the first two parameters in %rdi and %rsi respectively.


Treating (a) and (b) as input only - swap_3 :

void swap_3 (int *a, int *b)
{
    int tmp0, tmp1;

    __asm__ volatile (
        "mov (%[a]), %[x]\n\t" /* x = (*a) */
        "mov (%[b]), %[y]\n\t" /* y = (*b) */
        "cmp %[y], %[x]\n\t"
        "jle  %=f\n\t"         /* if (x <= y) (at&t!) */
        "mov %[y], (%[a])\n\t"
        "mov %[x], (%[b])\n\t"
        "%=:\n\t"

        : [x] "=&r" (tmp0), [y] "=&r" (tmp1)
        : [a] "r" (a), [b] "r" (b) : "memory" /* "cc" */ );
}

I've done away with the 'l' suffix and 'k' modifiers here, because they're not needed. I've also used the 'symbolic name' syntax for operands, as it often helps to make the code more readable.

(a) and (b) are now indeed input-only registers. So what's the "=&r" syntax mean? The & denotes an early clobber operand. In this case, the value may be written to before we finish using the input operands, and therefore the compiler must choose registers different from those selected for the input operands.

Once again, the compiler generates identical code as it did for swap_1 and swap_2.


I wrote way more than I planned on this answer, but as you can see, it's very difficult to maintain awareness of all the information the compiler must be made aware of, as well as the idiosyncrasies of each instruction set (ISA) and ABI.

这篇关于如何在GCC内联汇编中使用标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆