简短的x86呼叫说明 [英] Shorter x86 call instruction

查看：46 发布时间：2021/4/21 19:02:41 assembly x86 call micro-optimization machine-code

本文介绍了简短的x86呼叫说明的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于上下文，我是x86高尔夫.

For context I am x86 golfing.

00000005 <start>:
   5:   e8 25 00 00 00          call   2f <cube>
   a:   50                      push   %eax

稍后再打多个电话...

Multiple calls later...

0000002f <cube>:
  2f:   89 c8                   mov    %ecx,%eax
  31:   f7 e9                   imul   %ecx
  33:   f7 e9                   imul   %ecx
  35:   c3                      ret

call 占用了5个字节，即使偏移量适合单个字节！有什么方法可以编写 call cube 并使用GNU汇编器进行汇编并获得较小的偏移量?我知道可以使用16位偏移量，但理想情况下，我会使用2个字节的指令，如 call reg .

call took 5 bytes even though the offset fit into a single byte! Is there any way to write call cube and assemble with GNU assembler and get a smaller offset? I understand 16 bit offsets could be used, but ideally I'd have a 2 byte instruction like call reg.

推荐答案

没有 call rel8 ，也没有任何方法可以将返回地址和 jmp 压入少于5个字节.

There is no call rel8, or any way to push a return address and jmp in fewer than 5 bytes.

要使用 call reg 领先，您需要在寄存器中生成少于3个字节的完整地址.即使是相对于RIP的LEA也无济于事，因为它仅以 rel32 形式存在，而不以 rel8 形式存在.

To come out ahead with call reg, you need to generate a full address in a register in less than 3 bytes. Even a RIP-relative LEA doesn't help, because it only exists in rel32 form, not rel8.

对于一个 call 来说，显然不值得.如果您可以将相同的函数指针寄存器用于多个2字节的 call reg 指令，那么即使只有2个 call s ，您也能脱颖而出(5字节 mov reg，imm32 加上2x 2字节 call reg 总共为9个字节，而2x 5字节 call ).但这确实需要您注册.


For a single call, clearly not worth it.  If you can reuse the same function pointer register for multiple 2-byte call reg instructions, then you come out ahead even with just 2 calls (5 byte mov reg, imm32 plus 2x 2-byte call reg is a total of 9 bytes, vs. 10 for 2x 5-byte call).  But it does cost you a register.
大多数操作系统不允许您在最低的页面中映射任何内容(因此，NULL指针解引用错误)，因此在16位模式之外，可用地址大于16位.
Most OSes don't let you map anything in the lowest pages (so NULL-pointer deref faults), so usable addresses are larger than 16 bits, outside of 16-bit mode.
在32位/64位代码中，我会考虑将代码映射到零页所必需的链接器选项，这是代码高尔夫球答案字节数的一部分.(还有/proc/sys/vm/mmap_min_addr 内核设置，或其他操作系统上的等效版本)
In 32-bit / 64-bit code, I'd consider the linker options necessary to get your code mapped in the zero page as part of the byte-count of your code-golf answer.  (And also the /proc/sys/vm/mmap_min_addr kernel setting, or equivalent on other OSes) 
如果可以的话，通常避免在代码高尔夫中进行 call .通常最好构造循环，以避免需要重复使用代码.例如 jmp 进入循环的中间，以使循环的一部分运行正确的次数，而不是多次调用一个块.
Generally avoid call in code-golf if you can.  It's usually better to structure your loops to avoid needing code-reuse.  e.g. jmp into the middle of a loop to get part of the loop to run the right number of times, instead of calling a block multiple times.
我想我通常会看一些代码问题，这些问题很自然地适用于机器代码，并且可以避免在多个地方使用相同的代码块.我已经可以花几个小时来调整一个简短的功能，所以对一个问题的答案会花更多的代码(因此在代码的各个部分之间/之间有更多的优化空间)对我来说是很少的.
I guess I usually look at code-golf questions which lend themselves naturally to machine code, and can avoid needing the same block of code from multiple places.  I can already spend hours tweaking a short function, so starting an answer to a question that will take more code (and thus have even more room for optimization between / across parts of it) is rare for me.

                        这篇关于简短的x86呼叫说明的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

简短的x86呼叫说明 [英] Shorter x86 call instruction

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

简短的x86呼叫说明 [英] Shorter x86 call instruction

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭