使用 sub/cmp/setbe 逆向工程 asm 回到 C?我的尝试是编译到分支 [英] Reverse-engineering asm using sub / cmp / setbe back to C? My attempt is compiling to branches

查看:68
本文介绍了使用 sub/cmp/setbe 逆向工程 asm 回到 C?我的尝试是编译到分支的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我应该翻译的汇编代码:f1:

subl $97, %edixorl %eax, %eaxcmpb 25 美元,%dil设置为 %al回复

这是我写的我认为等效的 c 代码.

int f1(int y){int x = y-97;int i = 0;如果(x<=25){x = i;}返回 x;}

这是我从编译 C 代码中得到的.

_f1:## @f1

.cfi_startproc

%bb.0:

pushq %rbp.cfi_def_cfa_offset 16.cfi_offset %rbp, -16movq %rsp, %rbp.cfi_def_cfa_register %rbp## kill: def %edi 杀死 %edi def %rdileal -97(%rdi), %ecxxorl %eax, %eaxcmpl 123 美元,%edicmovgel %ecx, %eaxpopq %rbp回复.cfi_endproc

我想知道这是否正确/应该有什么不同,是否有人可以帮助解释 jmps 是如何工作的,因为我也在尝试翻译此汇编代码并被卡住了f2:

cmpl $1, %edi.L6movl $2, %edxmovl $1, %eaxjmp .L5

.L8:

movl %ecx, %edx

.L5:

imull %edx, %eaxleal 1(%rdx), %ecxcmpl %eax, %edi.L8

.L4:

cmpl %edi, %eax设置 %almovzbl %al, %eax回复

.L6:

movl $1, %eaxjmp .L4

解决方案

gcc8.3 -O3 准确地发出问题中的 asm,用于使用无符号比较技巧编写范围检查的方式.

int is_ascii_lowercase_v2(int y){无符号字符 x = y-'a';return x <= (unsigned)('z'-'a');}

int 减法之后缩小到 8 位 可以更精确地匹配 asm,但是为了正确性甚至说服编译器使用 32 位 <代码>子.对于unsigned char y,RDI 的高位字节允许容纳任意垃圾(x86-64 System V 调用约定),但进位只能通过 sub 和 add 从低到高传播.

结果的低 8 位(即所有 cmp 读取)将与 sub $'a', %dilsub 相同$'a', %edi.

将其编写为正常的范围检查也会让 gcc 发出相同的代码,因为编译器知道如何优化范围检查.(并且 gcc 选择对 sub 使用 32 位操作数大小,而不像使用 8 位的 clang.)

int is_ascii_lowercase_v3(char y){返回 (y>='a' && y<='z');}

在Godbolt编译器浏览器上,这个和_v2编译如下:

## gcc8.3 -O3is_ascii_lowercase_v3: # 和 _v2 相同低于 97 美元,%edixorl %eax, %eaxcmpb 25 美元,%dil设置为 %al回复

<小时>

将比较结果作为整数返回,而不是使用 if,更自然地匹配 asm.

但是,除非您启用优化,否则即使在 C 中无分支地"编写它也不会与 asm 匹配.gcc/clang 的默认代码生成是 -O0:反优化一致调试,在语句之间存储/重新加载所有内容到内存.(以及函数入口上的函数参数.)您需要优化,因为 -O0 代码生成(故意)主要是脑残,而且看起来很糟糕.请参阅如何去除噪音"来自 GCC/clang 程序集输出?

## gcc8.3 -O0is_ascii_lowercase_v2:pushq %rbpmovq %rsp, %rbpmovl %edi, -20(%rbp)movl -20(%rbp), %eax低于 97 美元,%eaxmovb %al, -1(%rbp)cmpb $25, -1(%rbp)设置为 %almovzbl %al, %eaxpopq %rbp回复

<小时>

启用优化的 gcc 和 clang 将在高效时将 if 转换为无分支代码.例如

int is_ascii_lowercase_branchy(char y){无符号字符 x = y-'a';如果 (x <25U) {返回 1;}返回0;}

仍然使用 GCC8.3 -O3 编译为相同的 asm

is_ascii_lowercase_branchy:低于 97 美元,%edixorl %eax, %eaxcmpb 25 美元,%dil设置为 %al回复

<小时>

我们可以看出优化级别至少是gcc -O2.在 -O1 处,gcc 在 setbe

之前使用效率较低的 setbe/movzx 而不是异或清零 EAX

is_ascii_lowercase_v2:低于 97 美元,%edicmpb 25 美元,%dil设置为 %almovzbl %al, %eax回复

我永远无法让 clang 重现完全相同的指令序列.它喜欢使用 add $-97, %edi, and cmp with $26/setb.

或者它会做非常有趣(但次优)的事情:

# clang7.0 -O3is_ascii_lowercase_v2:addl $159, %edi # 256-97 = -97 的 8 位版本和 254 美元,%edi # 0xFE;我还没有弄清楚为什么它清除低位和高位xorl %eax, %eaxcmpl 26 美元,%edi设置 %al回复

所以这涉及到 -(x-97),也许在某处使用了 2 的补码标识(-x = ~x + 1).>

this is the assembly code i am supposed to translate: f1:

subl    $97, %edi
xorl    %eax, %eax
cmpb    $25, %dil
setbe   %al
ret

heres the c code I wrote that I think is equivalent.

int f1(int y){

  int x = y-97;
  int i = 0;

  if(x<=25){
    x = i;
  }
  return x;
}

and heres what I get from compiling the C code.

_f1: ## @f1

.cfi_startproc

%bb.0:

pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq    %rsp, %rbp
.cfi_def_cfa_register %rbp
                  ## kill: def %edi killed %edi def %rdi
leal    -97(%rdi), %ecx
xorl    %eax, %eax
cmpl    $123, %edi
cmovgel %ecx, %eax
popq    %rbp
retq
.cfi_endproc

I was wondering if this was correct / what should be different and if anyone could help explain how jmps work as I am also trying to translate this assembly code and have gotten stuck f2:

cmpl    $1, %edi
jle .L6
movl    $2, %edx
movl    $1, %eax
jmp .L5

.L8:

movl    %ecx, %edx

.L5:

imull   %edx, %eax
leal    1(%rdx), %ecx
cmpl    %eax, %edi
jg  .L8

.L4:

cmpl    %edi, %eax
sete    %al
movzbl  %al, %eax
ret

.L6:

movl    $1, %eax
jmp .L4

解决方案

gcc8.3 -O3 emits exactly the asm in the question for this way of writing the range check using the unsigned-compare trick.

int is_ascii_lowercase_v2(int y){
    unsigned char x = y-'a';
    return x <= (unsigned)('z'-'a');
}

Narrowing to 8-bit after the int subtract matches the asm more exactly, but it's not necessary for correctness or even to convince compilers to use a 32-bit sub. For unsigned char y, the upper bytes of RDI are allowed to hold arbitrary garbage (x86-64 System V calling convention), but carry only propagates from low to high with sub and add.

The low 8 bits of the result (which is all the cmp reads) would be the same with sub $'a', %dil or sub $'a', %edi.

Writing it as a normal range-check also gets gcc to emit identical code, because compilers know how optimize range-checks. (And gcc chooses to use 32-bit operand-size for the sub, unlike clang which uses 8-bit.)

int is_ascii_lowercase_v3(char y){
    return (y>='a' && y<='z');
}

On the Godbolt compiler explorer, this and _v2 compile as follows:

## gcc8.3 -O3
is_ascii_lowercase_v3:    # and _v2 is identical
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret


Returning a compare result as an integer, instead of using an if, much more naturally matches the asm.

But even writing it "branchlessly" in C won't match the asm unless you enable optimization. The default code-gen from gcc/clang is -O0: anti-optimize for consistent debugging, storing/reloading everything to memory between statements. (And function args on function entry.) You need optimization, because -O0 code-gen is (intentionally) mostly braindead, and nasty looking. See How to remove "noise" from GCC/clang assembly output?

## gcc8.3 -O0
is_ascii_lowercase_v2:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -20(%rbp)
    movl    -20(%rbp), %eax
    subl    $97, %eax
    movb    %al, -1(%rbp)
    cmpb    $25, -1(%rbp)
    setbe   %al
    movzbl  %al, %eax
    popq    %rbp
    ret


gcc and clang with optimization enabled will do if-conversion to branchless code when it's efficient. e.g.

int is_ascii_lowercase_branchy(char y){
    unsigned char x = y-'a';
    if (x < 25U) { 
        return 1;
    }
    return 0;
}

still compiles to the same asm with GCC8.3 -O3

is_ascii_lowercase_branchy:
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret


We can tell that the optimization level was at least gcc -O2. At -O1, gcc uses the less efficient setbe / movzx instead of xor-zeroing EAX ahead of setbe

is_ascii_lowercase_v2:
    subl    $97, %edi
    cmpb    $25, %dil
    setbe   %al
    movzbl  %al, %eax
    ret

I could never get clang to reproduce exactly the same sequence of instructions. It likes to use add $-97, %edi, and cmp with $26 / setb.

Or it will do really interesting (but sub-optimal) things like this:

# clang7.0 -O3
is_ascii_lowercase_v2:
    addl    $159, %edi    # 256-97 = 8-bit version of -97
    andl    $254, %edi    # 0xFE; I haven't figured out why it's clearing the low bit as well as the high bits
    xorl    %eax, %eax
    cmpl    $26, %edi
    setb    %al
    retq

So this is something involving -(x-97), maybe using the 2's complement identity in there somewhere (-x = ~x + 1).

这篇关于使用 sub/cmp/setbe 逆向工程 asm 回到 C?我的尝试是编译到分支的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆