FreeBSD syscall比Linux破坏了更多的寄存器?内联asm优化级别之间的不同行为 [英] FreeBSD syscall clobbering more registers than Linux? Inline asm different behaviour between optimization levels

查看:82
本文介绍了FreeBSD syscall比Linux破坏了更多的寄存器?内联asm优化级别之间的不同行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我正在玩freebsd系统调用,因为i386部分在

在上面的代码中,我只是期望它调用sys_write 1000次,然后返回main.我使用桁架检查系统调用及其参数.用-O0一切都可以正常工作,但是当我使用-O3进行循环时,它将永远卡住.我相信系统调用会将 i 变量或 1000 更改为怪异的东西.

函数main的汇编代码转储:

  0x0000000000201900< + 0> ;:推送%rbp0x0000000000201901< + 1> ;: mov%rsp,%rbp0x0000000000201904< + 4> ;: mov $ 0x3e8,%r8d0x000000000020190a< + 10> ;: lea -0x1(%rbp),%rsi0x000000000020190e< + 14> ;: mov $ 0x1,%edx0x0000000000201913< + 19> ;: mov $ 0xffffffffffffffff,%rdi0x000000000020191a< + 26> ;: nopw 0x0(%rax,%rax,1)0x0000000000201920< + 32> ;: movb $ 0x0,-0x1(%rbp)0x0000000000201924< + 36> ;: mov $ 0x4,%eax0x0000000000201929< + 41> ;:系统调用0x000000000020192b< + 43> ;:添加$ 0xffffffff,%r8d0x000000000020192f< + 47> ;: jne 0x201920< main + 32>0x0000000000201931< + 49> ;: xor%eax,%eax0x0000000000201933< + 51> ;: pop%rbp0x0000000000201934< + 52> ;:保留 

sys_write()有什么问题?为什么for循环卡住了?

优化级别确定clang决定将其循环计数器保留在何处:在内存中(未优化)还是在寄存器中,在这种情况下为 r8d (优化).R8D对于编译器来说是一个合理的选择:它是一个调用密集型的reg,无需保存在 main 的开始/结尾即可使用,并且您已经告诉了它所有不需要REX即可使用的寄存器前缀(如ECX)是asm语句的输入/输出或掩体.

注意:如果FreeBSD与MacOS相似,则系统调用错误/无错误状态将以CF(进位标志)返回,而不是通过RAX处于-4095 ..- 1范围内.在那种情况下,您需要一个GCC6标志输出操作数,例如"= @ ccc"(err)表示 int err ( #ifdef __GCC_ASM_FLAG_OUTPUTS __ - syscall 指令本身将其用于我们不确定 ;评论将其称为返回值2".(即作为RDX:RAX返回值的一部分吗?).我们也不知道ABI保证FreeBSD打算在未来的内核中维护什么.

您不能假设R8-R10在 syscall 之后为零,因为在跟踪/单步执行时,它们实际上是保留的而不是零.(因为内核出于与Linux相同的原因而选择不通过 sysret 返回:硬件/设计错误使得如果在系统调用中通过ptrace修改了寄存器,则不安全.例如,尝试执行具有非典型RIP的 sysret 将在Intel CPU上以环0(内核模式)#GP进入环0(内核模式)!这很麻烦,因为此时RSP =用户堆栈.)


相关的内核代码为 sysret 路径(由@NateEldredge很好地发现;我通过搜索swapgs找到了syscall入口点,但没有去寻找返回路径).

该函数调用保留的寄存器不需要由该代码恢复,因为调用C函数并不会首先破坏它们.并且代码确实恢复了功能调用密集的旧版"代码,注册RDI,RSI和RDX.

R8-R11是按照函数调用约定进行调用的寄存器,并且位于原始的8个x86寄存器之外.这就是使它们特殊"的原因.(R11不会被清零; syscall/sysret将其用于RFLAGS,因此这是您在 syscall 之后找到的值)

归零比加载它们快 ,在正常情况下(libc包装函数中的 syscall 指令),您将返回到仅一个调用方假定函数调用约定,因此将假定R8-R11已被丢弃(RDI,RSI,RDX和RCX相同,尽管FreeBSD 确实出于某种原因会麻烦恢复它们.)


仅当不是单步执行或跟踪操作(例如 truss 或GDB si )时,才会发生这种调零. syscall 进入amd64内核(Github)的入口点确实保存了所有传入的寄存器,因此可以通过其他方式从内核中恢复它们.


更新的 asm()包装器

 //对于FreeBSD应该已修复,并进行了其他改进ssize_t sys_write(int fd,const void * data,size_t size){注册ssize_t res __asm __("rax");寄存器int arg0 __asm __("edi")= fd;寄存器const void * arg1 __asm __("rsi")=数据;//您可以使用实型寄存器size_t arg2 __asm __("rdx")=大小;__asm__ __volatile __(系统调用"//RDX *可能*崩溃了:"= a";(res),"+ r";(arg2)//RDI,保留RSI:"a";(SYS_write),"r";(arg0),"r"表示(arg1)//R10,R8或R9中的arg肯定是:"rcx","r11",内存","r8","r9","r10";//////修复:r8-r10//请参阅下文,了解避免使用内存"具有虚拟输入操作数的对象);返回资源;} 

"+ r" 输出/输入操作数与需要注册long arg3 asm("r10")或类似参数的任何args用于r8或r9.

这是在包装函数中的,因此C变量的修改后的值将被丢弃,从而迫使每次都重复调用以建立args.那将是防御性的"解决方案.直到另一个答案确定了更多绝对未使用的寄存器.


我确实中断了* 0x000000000020192b,然后发生中断时信息寄存器.r8为零.在这种情况下程序仍然卡住了

在您通过 syscall 指令执行GDB continue 之前,我假定 r8 不是为零.是的,该测试确认了FreeBSD内核不是单步执行时就在破坏 r8 .(并且行为方式与我们在源代码中看到的相匹配.)


请注意,您可以告诉编译器,使用虚拟"m" 进行 write 系统调用仅读取内存(不写入).输入操作数,而不是内存" 容器.这样可以使 c 的存储脱离循环.(如何指示可以使用内联ASM参数*指向*的内存?)

"m"(*(const char(*)[size])数据)作为输入,而不是"memory" Clobber.

如果您要为您使用的每个系统调用编写特定的包装器,而不是为每个将每个操作数都强制转换为 unsigned long 的3操作数系统调用使用的通用包装器,这样做可以带来好处.

说到这,绝对没有必要让您的syscall args全部都是 long ;将用户空间符号扩展 int fd 放入64位寄存器中只是浪费指令.内核ABI(几乎可以肯定)会像Linux一样忽略窄args寄存器的高字节.(同样,除非您要制作一个通用的 syscall3 包装器,该包装器仅与不同的 SYS _ 数字一起使用以定义写入,读取和其他3操作数系统调用;然后您会将所有内容都强制转换为寄存器宽度,并仅使用内存" 破坏符).

我对下面的修改版进行了这些更改.

还要注意,对于RDI,RSI和RDX,可以使用特定寄存器字母约束来代替寄存器asm本地变量,就像您在对RAX中的返回值所做的操作一样.= a" ).顺便说一句,您实际上不需要电话号码的匹配约束,只需使用"a" 输入即可;它更易于阅读,因为您无需查看另一个操作数即可检查您是否匹配正确的输出.

 //假设RDX被破坏.//可以删除+,如果不是的话.ssize_t sys_write(int fd,const void * data,size_t size){//注册long arg3 __asm __("r10")= ??;//register-asm对于R8及更高版本很有用ssize_t res;__asm__ __volatile __("syscall";//RDX:"= a";(res),"+ d";(尺寸)//EAX/RAX RDI RSI:"a";(SYS_write),"D";(fd),"S"表示(数据),"m"(*(const char(*)[size])data)//告诉编译器此mem是输入:"rcx","r11";//,内存";#ifndef __linux__,"r8","r9","r10"//Linux总是还原这些#万一);返回资源;} 

对于某些操作数,有些人更喜欢 register ... asm(")),因为您可以使用完整的寄存器名称,而不必记住完全非-明显的"D"RDI/EDI/DI/DIL与"d"的关系用于RDX/EDX/DX/DL

Recently I was playing with freebsd system calls I had no problem for i386 part since its well documented at here. But i can't find same document for x86_64.

I saw people are using same way like on linux but they use just assembly not c. I suppose in my case system call actually changing some register which is used by high optimization level so it gives different behaviour.

/* for SYS_* constants */
#include <sys/syscall.h>

/* for types like size_t */
#include <unistd.h>

ssize_t sys_write(int fd, const void *data, size_t size){
    register long res __asm__("rax");
    register long arg0 __asm__("rdi") = fd;
    register long arg1 __asm__("rsi") = (long)data;
    register long arg2 __asm__("rdx") = size;
    __asm__ __volatile__(
        "syscall"
        : "=r" (res)
        : "0" (SYS_write), "r" (arg0), "r" (arg1), "r" (arg2)
        : "rcx", "r11", "memory"
    );
    return res;
}

int main(){
    for(int i = 0; i < 1000; i++){
        char a = 0;
        int some_invalid_fd = -1;
        sys_write(some_invalid_fd, &a, 1);
    }
    return 0;
}

In above code I just expect it to call sys_write 1000 times then return main. I use truss to check system call and their parameters. Everything works fine with -O0 but when I go -O3 for loop getting stuck forever. I believe system call changing i variable or 1000 to something weird.

Dump of assembler code for function main:

0x0000000000201900 <+0>:     push   %rbp
0x0000000000201901 <+1>:     mov    %rsp,%rbp
0x0000000000201904 <+4>:     mov    $0x3e8,%r8d
0x000000000020190a <+10>:    lea    -0x1(%rbp),%rsi
0x000000000020190e <+14>:    mov    $0x1,%edx
0x0000000000201913 <+19>:    mov    $0xffffffffffffffff,%rdi
0x000000000020191a <+26>:    nopw   0x0(%rax,%rax,1)
0x0000000000201920 <+32>:    movb   $0x0,-0x1(%rbp)
0x0000000000201924 <+36>:    mov    $0x4,%eax
0x0000000000201929 <+41>:    syscall 
0x000000000020192b <+43>:    add    $0xffffffff,%r8d
0x000000000020192f <+47>:    jne    0x201920 <main+32>
0x0000000000201931 <+49>:    xor    %eax,%eax
0x0000000000201933 <+51>:    pop    %rbp
0x0000000000201934 <+52>:    ret

What is wrong with sys_write()? Why for loop getting stuck?

解决方案

Optimization level determines where clang decides to keep its loop counter: in memory (unoptimized) or in a register, in this case r8d (optimized). R8D is a logical choice for the compiler: it's a call-clobbered reg it can use without saving at the start/end of main, and you've told it all the registers it could use without a REX prefix (like ECX) are either inputs / outputs or clobbers for the asm statement.

Note: if FreeBSD is like MacOS, system call error / no-error status is returned in CF (the carry flag), not via RAX being in the -4095..-1 range. In that case, you'd want a GCC6 flag-output operand like "=@ccc" (err) for int err(#ifdef __GCC_ASM_FLAG_OUTPUTS__ - example) or a setc %cl in the template to materialize a boolean manually. (CL is a good choice because you can just use it as an output instead of a clobber.)


FreeBSD's syscall handling trashes R8, R9, and R10, in addition to the bare minimum clobbering the Linux does: RAX (retval) and RCX / R11 (The syscall instruction itself uses them to save RIP / RFLAGS so the kernel can find its way back to user-space, so the kernel never even sees the original values.)

Possibly also RDX, we're not sure; the comments call it "return value 2" (i.e. as part of a RDX:RAX return value?). We also don't know what future-proof ABI guarantees FreeBSD intends to maintain in future kernels.

You can't assume R8-R10 are zero after syscall because they're actually preserved instead of zeroed when tracing / single-stepping. (Because then the kernel chooses not to return via sysret, for the same reason as Linux: hardware / design bugs make that unsafe if registers might have been modified by ptrace while inside the system call. e.g. attempting to sysret with a non-canonical RIP will #GP in ring 0 (kernel mode) on Intel CPUs! That's a disaster because RSP = user stack at that point.)


The relevant kernel code is the sysret path (well spotted by @NateEldredge; I found the syscall entry point by searching for swapgs, but hadn't gotten to looking at the return path).

The function-call-preserved registers don't need to be restored by that code because calling a C function didn't destroy them in the first place. and the code does restore the function-call-clobbered "legacy" registers RDI, RSI, and RDX.

R8-R11 are the registers that are call-clobbered in the function-calling convention, and that are outside the original 8 x86 registers. So that's what makes them "special". (R11 doesn't get zeroed; syscall/sysret uses it for RFLAGS, so that's the value you'll find there after syscall)

Zeroing is slightly faster than loading them, and in the normal case (syscall instruction inside a libc wrapper function) you're about to return to a caller that's only assuming the function-calling convention, and thus will assume that R8-R11 are trashed (same for RDI, RSI, RDX, and RCX, although FreeBSD does bother to restore those for some reason.)


This zeroing only happens when not single-stepping or tracing (e.g. truss or GDB si). The syscall entry point into an amd64 kernel (Github) does save all the incoming registers, so they're available to be restored by other ways out of the kernel.


Updated asm() wrapper

// Should be fixed for FreeBSD, plus other improvements
ssize_t sys_write(int fd, const void *data, size_t size){
    register ssize_t res __asm__("rax");
    register int arg0 __asm__("edi") = fd;
    register const void *arg1 __asm__("rsi") = data;  // you can use real types
    register size_t arg2 __asm__("rdx") = size;
    __asm__ __volatile__(
        "syscall"
                    // RDX *maybe* clobbered
        : "=a" (res), "+r" (arg2)
                           // RDI, RSI preserved
        : "a" (SYS_write), "r" (arg0), "r" (arg1)
          // An arg in R10, R8, or R9 definitely would be
        : "rcx", "r11", "memory", "r8", "r9", "r10"   ////// The fix: r8-r10
         // see below for a version that avoids the "memory" clobber with a dummy input operand
    );
    return res;
}

Use "+r" output/input operands with any args that need register long arg3 asm("r10") or similar for r8 or r9.

This is inside a wrapper function so the modified value of the C variables get thrown away, forcing repeated calls to set up the args every time. That would be the "defensive" approach until another answer identifies more definitely-non-trashed registers.


I did break *0x000000000020192b then info registers when break happened. r8 is zero. Program still gets stuck in this case

I assume that r8 wasn't zero before you did that GDB continue across the syscall instruction. Yes, that test confirms that the FreeBSD kernel is trashing r8 when not single-stepping. (And behaving in a way that matches what we see in the source code.)


Note that you can tell the compiler that a write system call only reads memory (not writes) using a dummy "m" input operand instead of a "memory" clobber. That would let it hoist the store of c out of the loop. (How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)

i.e. "m"(*(const char (*)[size]) data) as an input instead of a "memory" clobber.

If you're going to write specific wrappers for each syscall you use, instead of a generic wrapper you use for every 3-operand syscall that just casts all operands to unsigned long, this is the advantage you can get from doing that.

Speaking of which, there's absolutely no point in making your syscall args all be long; making user-space sign-extend int fd into a 64-bit register is just wasted instructions. The kernel ABI will (almost certainly) ignore the high bytes of registers for narrow args, like Linux does. (Again, unless you're making a generic syscall3 wrapper that you just use with different SYS_ numbers to define write, read, and other 3-operand system calls; then you would cast everything to register-width and just use a "memory" clobber).

I made these changes for my modified version below.

Also note that for RDI, RSI, and RDX, there are specific-register letter constraints which you can use instead of register-asm locals, just like you're doing for the return value in RAX ("=a"). BTW, you don't really need a matching constraint for the call number, just use an "a" input; it's easier to read because you don't need to look at another operand to check that you're matching the right output.

// assuming RDX *is* clobbered.
// could remove the + if it isn't.
ssize_t sys_write(int fd, const void *data, size_t size)
{
    // register long arg3 __asm__("r10") = ??;
    // register-asm is useful for R8 and up

    ssize_t res;
    __asm__ __volatile__("syscall"
                    // RDX
        : "=a" (res), "+d" (size)
         //  EAX/RAX       RDI       RSI
        : "a" (SYS_write), "D" (fd), "S" (data),
          "m" (*(const char (*)[size]) data) // tells compiler this mem is an input
        : "rcx", "r11"    //, "memory"
#ifndef __linux__
              , "r8", "r9", "r10"   // Linux always restores these
#endif
    );
    return res;
}

Some people prefer register ... asm("") for all the operands because you get to use the full register name, and don't have to remember the totally-non-obvious "D" for RDI/EDI/DI/DIL vs. "d" for RDX/EDX/DX/DL

这篇关于FreeBSD syscall比Linux破坏了更多的寄存器?内联asm优化级别之间的不同行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆