x86-64 SysV ABI中的参数和返回值寄存器的高位是否允许乱码? [英] Is garbage allowed in high bits of parameter and return value registers in x86-64 SysV ABI?

查看:83
本文介绍了x86-64 SysV ABI中的参数和返回值寄存器的高位是否允许乱码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

x86-64 SysV ABI指定了如何在寄存器中传递函数参数(在rdi中是第一个参数,然后在rsi中以此类推),以及如何将整数返回值传递回(在rax,然后rdx表示非常大的值).

The x86-64 SysV ABI specifies, among other things, how function parameters are passed in registers (first argument in rdi, then rsi and so on), and how integer return values are passed back (in rax and then rdx for really big values).

但是,我找不到的是传递小于64位的类型时参数或返回值寄存器的高位应该是什么.

What I can't find, however, is what the high bits of parameter or return value registers should be when passing types smaller than 64-bits.

例如,用于以下功能:

void foo(unsigned x, unsigned y);

... x将在rdiyrsi中传递,但是它们只有32位. rdirsi的高32位是否需要为零?凭直觉,我会假设是,但是所有gcc,clang和icc生成的代码开头的c10>指令将高位清零,因此似乎编译器采取了其他措施.

... x will be passed in rdi and y in rsi, but they are only 32-bits. Do the high 32-bits of rdi and rsi need to be zero? Intuitively, I would assume yes, but the code generated by all of gcc, clang and icc has specific mov instructions at the start to zero out the high bits, so it seems like the compilers assume otherwise.

同样,如果返回值小于64位,则编译器似乎假定返回值rax的高位可能具有垃圾位.例如,以下代码中的循环:

Similarly, the compilers seem to assume that the high bits of the return value rax may have garbage bits if the return value is smaller than 64-bits. For example, the loops in the following code:

unsigned gives32();
unsigned short gives16();

long sum32_64() {
  long total = 0;
  for (int i=1000; i--; ) {
    total += gives32();
  }
  return total;
}

long sum16_64() {
  long total = 0;
  for (int i=1000; i--; ) {
    total += gives16();
  }
  return total;
}

... 编译clang中的以下内容(其他编译器与此类似):

... compile to the following in clang (and other compilers are similar):

sum32_64():
...
.LBB0_1:                               
    call    gives32()
    mov     eax, eax
    add     rbx, rax
    inc     ebp
    jne     .LBB0_1


sum16_64():
...
.LBB1_1:
    call    gives16()
    movzx   eax, ax
    add     rbx, rax
    inc     ebp
    jne     .LBB1_1

请注意在调用返回32位之后的mov eax, eax和在16位调用之后的movzx eax, ax -分别具有将高32位或48位清零的作用.因此,此行为会产生一些费用-处理64位返回值的同一循环会忽略此指令.

Note the mov eax, eax after the call returning 32-bits, and the movzx eax, ax after the 16-bit call - both have the effect of zeroing out the top 32 or 48 bits, respectively. So this behavior has some cost - the same loop dealing with a 64-bit return value omits this instruction.

我已阅读 x86- 64 System V ABI文档相当仔细,但是我找不到标准中是否记录了这种行为.

I've read the x86-64 System V ABI document pretty carefully, but I couldn't find whether this behavior documented in the standard.

这样的决定有什么好处?在我看来,这显然是有代价的:

What are the benefits of such a decision? It seems to me there are clear costs:

在处理参数值时,将对被调用方的实现施加成本.以及在处理参数时的功能.当然,由于该函数可以有效地忽略高位,所以该开销通常为零,或者由于可以使用32位操作数大小指令将高位隐式归零,所以零开销是免费的.

Costs are imposed on the implementation of callee when dealing with parameter values. and in the functions when dealing with the parameters. Granted, often this cost is zero because the function can effectively ignore the high bits, or the zeroing comes for free since 32-bit operand size instructions can be used which implicitly zero the high bits.

但是,对于接受32位参数并执行一些可以从64位数学中受益的数学函数的情况,成本通常是非常高昂的.以此功能为例:

However, costs are often very real in the cases of functions that accept 32-bit arguments and do some math that could benefit from 64-bit math. Take this function for example:

uint32_t average(uint32_t a, uint32_t b) {
  return ((uint64_t)a + b) >> 2;
}

直接使用64位数学来计算否则必须仔细处理溢出的函数(以这种方式转换许多32位函数的能力通常是64位体系结构未注意到的好处).编译为:

A straightforward use of 64-bit math to calculate a function that would otherwise have to carefully deal with overflow (the ability to transform many 32-bit functions in this way is an often unnoticed benefit of 64-bit architectures). This compiles to:

average(unsigned int, unsigned int):
        mov     edi, edi
        mov     eax, esi
        add     rax, rdi
        shr     rax, 2
        ret  

仅需要将高位清零,就需要4条指令中的2条(忽略ret).在实践中使用消除运动可能很便宜,但是似乎仍然要付出很大的代价.

Fully 2 out of the 4 instructions (ignoring ret) are needed just to zero out the high bits. This may be cheap in practice with mov-elimination, but still it seems a big cost to pay.

另一方面,如果ABI将高位指定为零,那么我真的看不到给调用方带来类似的相应费用.因为rdirsi以及其他传递参数的寄存器是 scratch (即可以被调用者覆盖),所以只有两种情况(我们看一下rdi,但是将其替换)它与您选择的参数reg一样):

On other hand, I can't really see a similar corresponding cost for the callers if the ABI were to specify that high bits are zero. Because rdi and rsi and the other parameter passing registers are scratch (i.e., can be overwritten by the caller), you only have a couple scenarios (we look at rdi, but replace it with the paramter reg of your choice):

  1. 在呼叫后代码中,传递给rdi中的函数的值无效(不需要).在这种情况下,最后分配给rdi的任何指令都只需分配给edi.不仅免费,而且如果避免使用REX前缀,通常会小一个字节.

  1. The value passed to the function in rdi is dead (not needed) in the post-call code. In that case, whatever instruction last assigned to rdi simply has to assign to edi instead. Not only is this free, it is often one byte smaller if you avoid a REX prefix.

在函数之后需要在rdi 中传递给函数的值.在这种情况下,由于rdi是保存在调用方中的,因此调用方仍然需要对保存在被调用方中的寄存器执行值的mov.通常,您可以对其进行组织,以使被调用方保存的寄存器中的值开始(例如rbx),然后像mov edi, ebx一样移动到edi,因此无需花费任何费用.

The value passed to the function in rdi is needed after the function. In that case, since rdi is caller-saved, the caller needs to do a mov of the value to a callee-saved register anyway. You can generally organize it so that the value starts in the callee saved register (say rbx) and then is moved to edi like mov edi, ebx, so it costs nothing.

在很多情况下,调零会给调用者带来很大的负担.例如,如果在分配了rdi的最后一条指令中需要64位数学运算.不过,这似乎很少见.

I can't see many scenarios where the zeroing costs the caller much. Some examples would be if 64-bit math is needed in the last instruction which assigned rdi. That seems quite rare though.

这里的决定似乎更加中立.让被呼叫者清除垃圾具有确定的代码(有时您会看到mov eax, eax指令来执行此操作),但是如果允许垃圾,则成本将转移给被呼叫者.总体而言,调用者似乎更有可能免费清除垃圾,因此允许垃圾似乎对性能没有任何总体影响.

Here the decision seems more neutral. Having callees clear out the junk has a definite code (you sometimes see mov eax, eax instructions to do this), but if garbage is allowed the costs shifts to the callee. Overall, it seems more likely that the caller can clear the junk for free, so allowing garbage doesn't seem overall detrimental to performance.

我认为这种行为的一个有趣用例是大小不同的函数可以共享相同的实现.例如,以下所有功能:

I suppose one interesting use-case for this behavior is that functions with varying sizes can share an identical implementation. For example, all of the following functions:

short sums(short x, short y) {
  return x + y;
}

int sumi(int x, int y) {
  return x + y;
}

long suml(long x, long y) {
  return x + y;
}

实际上可以共享相同的实现 1 :

Can actually share the same implementation1:

sum:
        lea     rax, [rdi+rsi]
        ret


1 对于具有其地址的函数,实际上是否允许折叠是非常


1 Whether such folding is actually allowed for functions that have their address taken is very much open to debate.

推荐答案

您似乎在这里有两个问题:

It looks like you have two questions here:

  1. 返回值的高位是否需要在返回之前归零? (并且在调用之前是否需要将参数的高位清零?)
  2. 与该决定相关的成本/收益是什么?

第一个问题的答案是否,高位可能是垃圾,而彼得·科德斯(Peter Cordes)已经写了

The answer to the first question is no, there can be garbage in the high bits, and Peter Cordes has already written a very nice answer on the subject.

对于第二个问题,我怀疑未定义高位总体上对性能更好.一方面,使用32位运算时,零扩展值无需付出任何额外费用.但是另一方面,并​​非总是需要事先将高位清零.如果您允许高位垃圾,则可以将其留给接收值的代码,以便仅在实际需要时才执行零扩展(或符号扩展).

As for the second question, I suspect that leaving the high bits undefined is overall better for performance. On one hand, zero-extending values beforehand comes at no additional cost when 32-bit operations are used. But on the other hand, zeroing the high bits beforehand is not always necessary. If you allow garbage in the high bits, then you can leave it up to the code that receives the values to only perform zero-extensions (or sign-extensions) when they are actually required.

但是我想强调另一个考虑因素:安全性

But I wanted to highlight another consideration: Security

当不清除结果的高位时,它们可能会在堆栈/堆中保留其他信息的片段,例如函数指针或地址.如果曾经有一种机制可以执行更高特权的功能并在之后检索rax(或eax)的完整值,那么这可能会引入信息泄漏.例如,系统调用可能会将指针从内核泄漏到用户空间,从而导致内核 ASLR失败.或者 IPC 机制可能泄漏有关另一个进程地址空间的信息,这可能有助于开发沙盒突破口.

When the upper bits of a result are not cleared, they may retain fragments of other pieces of information, such as function pointers or addresses in the stack/heap. If there ever exists a mechanism to execute higher-privileged functions and retrieve the full value of rax (or eax) afterwards, then this could introduce an information leak. For example, a system call might leak a pointer from the kernel to user space, leading to a defeat of kernel ASLR. Or an IPC mechanism might leak information about another process' address space that could assist in developing a sandbox breakout.

当然,有人可能会辩称,防止信息泄漏不是ABI的责任;程序员必须正确地实现其代码.虽然我确实同意,但要求编译器将高位归零,仍然会消除这种特殊形式的信息泄漏.

Of course, one might argue that it is not the responsibility of the ABI to prevent information leaks; it is up to the programmer to implement their code correctly. While I do agree, mandating that the compiler zero the upper bits would still have the effect of eliminating this particular form of an information leak.

另一方面,更重要的是,编译器不应盲目地相信任何接收到的值的高位都清零,否则函数可能无法按预期运行,这也可能导致可利用的条件.例如,考虑以下内容:

On the other side of things, and more importantly, the compiler should not blindly trust that any received values have their upper bits zeroed out, or else the function may not behave as expected, and this could also lead to exploitable conditions. For example, consider the following:

unsigned char buf[256];
...
__fastcall void write_index(unsigned char index, unsigned char value) {
    buf[index] = value;
}

如果允许我们假定index的高位清零,则可以将上面的代码编译为:

If we were allowed to assume that index has its upper bits zeroed out, then we could compile the above as:

write_index:  ;; sil = index, dil = value
    mov rax, offset buf
    mov [rax+rsi], dil
    ret

但是,如果我们可以从自己的代码中调用此函数,则可以提供超出[0,255]范围的rsi值,并写入缓冲区之外的内存.

But if we could call this function from our own code, we could supply a value of rsi out of the [0,255] range and write to memory beyond the bounds of the buffer.

当然,编译器实际上不会生成这样的代码,因为如上所述, callee 的责任是对参数进行零扩展或符号扩展,而不是参数的扩展. 呼叫者.我认为,这是一个非常实际的原因,要让接收值的代码始终假定高位有垃圾并明确将其删除.

Of course, the compiler would not actually generate code like this, since, as mentioned above, it is the responsibility of the callee to zero- or sign-extend its arguments, rather than that of the caller. This, I think, is a very practical reason to have the code that receives a value always assume that there is garbage in the upper bits and explicitly remove it.

这篇关于x86-64 SysV ABI中的参数和返回值寄存器的高位是否允许乱码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆