如何执行带有64位绝对地址的调用指令? [英] How to execute a call instruction with a 64-bit absolute address?

查看:371
本文介绍了如何执行带有64位绝对地址的调用指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从机器代码中调用一个函数-在编译和链接时应具有绝对地址.我正在创建指向所需函数的函数指针,并尝试将其传递给调用指令,但是我注意到调用指令最多使用16位或32位地址.有没有办法调用一个绝对的64位地址?

I am trying to call a function - that should have an absolute address when compiled and linked - from machine code. I am creating a function pointer to the desired function and trying to pass that to the call instruction, but I noticed that the call instruction takes at most a 16 or 32-bit address. Is there a way to call an absolute 64-bit address?

我正在部署x86-64架构,并使用NASM生成机器代码.

I am deploying for the x86-64 architecture and using NASM to generate the machine code.

如果可以确保将可执行文件确实映射到底部4GB内存,则可以使用32位地址,但是我不确定在哪里可以找到该信息.

I could work with a 32-bit address if I could be guaranteed that the executable would be for sure mapped to the bottom 4GB of memory, but I am not sure where I could find that information.

编辑:我无法使用callf指令,因为这需要我禁用64位模式.

Edit: I cannot use the callf instruction, as that requires me to disable 64-bit mode.

第二次修改:我也不想将地址存储在寄存器中并调用该寄存器,因为这对性能至关重要,并且我无法承受间接函数调用的开销和性能损失

Second Edit: I also do not want to store the address in a register and call the register, as this is performance critical, and I cannot have the overhead and performance hit of an indirect function call.

最终编辑:通过确保我的机器代码映射到前2GB内存,我能够使用rel32调用指令.这是通过带有MAP_32BIT标志的mmap(我正在使用linux)实现的:

Final Edit: I was able to use the rel32 call instruction by ensuring that my machine code was mapping to the first 2GB of memory. This was achieved through mmap with the MAP_32BIT flag (I'm using linux):

MAP_32BIT(从Linux 2.4.20,2.6开始) 将映射放入存储区的前2 GB 进程地址空间.仅支持此标志 在x86-64上,用于64位程序.它被添加到 允许将线程堆栈分配到 前2GB的内存,以改善环境- 某些早期64位处理器上的性能切换. 现代x86-64处理器不再具有此功能 出现问题,因此不使用此标志 在这些系统上是必需的. MAP_32BIT标志是 设置MAP_FIXED时将被忽略.

MAP_32BIT (since Linux 2.4.20, 2.6) Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2GB of memory, so as to improve context- switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this per‐ formance problem, so use of this flag is not required on those systems. The MAP_32BIT flag is ignored when MAP_FIXED is set.

推荐答案

相关:

related: Handling calls to (potentially) far away ahead-of-time compiled functions from JITed code has more about JITing, especially allocating your JIT buffer near the code it wants to call, so you can use efficient call rel32. Or what to do if not.

在x86机器代码中调用绝对指针是关于calljmp到绝对地址的良好规范Q& A.

Also Call an absolute pointer in x86 machine code is a good canonical Q&A about call or jmp to an absolute address.

TL:DR:要按名称调用函数,只需像普通人一样使用call func,并让汇编器+链接器对其进行处理即可.既然您说您正在使用NASM,我想您实际上是在用汇编器生成机器代码.这听起来像是一个更复杂的问题,但我想您只是想问一下常规方法是否安全.

TL:DR: To call a function by name, just use call func like a normal person and let the assembler + linker take care of it. Since you say you're using NASM, I guess you're actually generating the machine code with an assembler. It sounded like a more complicated question, but I think you were just trying to ask if the normal way was safe.

间接call r/m64(FF /2)需要64位寄存器或64位模式下的内存操作数.

Indirect call r/m64 (FF /2) takes a 64-bit register or memory operand in 64-bit mode.

所以你可以做

func equ  0x123456789ab
; or if func is a regular label

mov   rax, func          ; mov r64, imm64,  or mov r32, imm32 if it fits
call  rax

通常,您会使用lea rax, [rel func]将标签地址放入寄存器中,但是如果可以编码,则只需使用call rel32.

Normally you'd put a label address into a register with lea rax, [rel func], but if that's encodeable then you'd just use call rel32.

或者,如果您知道机器码将存储在哪个地址,则可以在计算目标地址到目标地址末尾的地址差后,使用常规的直接call rel32编码. call指令.

Or, if you know what address your machine code will be stored in, you can use the normal direct call rel32 encoding, after you calculate the difference in address from the target to the end of the call instruction.

如果您不想使用间接调用,则rel32编码是您唯一的选择.确保您的机器码进入低2GiB,以便它可以到达低4GiB中的任何地址.

If you don't want to use an indirect call, then the rel32 encoding is your only option. Make sure your machine code goes into the low 2GiB so it can reach any address in the low 4GiB.

如果可以肯定的话,可执行文件肯定会映射到底部的4GB内存

if I could be guaranteed that the executable would be for sure mapped to the bottom 4GB of memory

是的,这是Linux,Windows和OS X的默认代码模型.AMD64调用/跳转指令和RIP相对寻址仅使用rel32编码,因此所有系统默认为小"代码模型代码和静态数据位于2GiB较低的位置,因此可以确保链接器只需填充rel32即可达到2G前向或2G后向.

Yes, this is the default code model for Linux, Windows, and OS X. AMD64 call / jump instructions, and RIP-relative addressing, only use rel32 encodings, so all systems default to the "small" code model where code and static data are in the low 2GiB, so it's guaranteed that the linker can just fill in a rel32 to reach up to 2G forward or 2G backward.

x86-64 System V ABI 确实讨论了大型/大型代码模型,但如果有人使用,则使用IDK,因为寻址数据和拨打电话效率低下.

The x86-64 System V ABI does discuss Large / Huge code models, but IDK if anyone ever uses that, because of the inefficiency of addressing data and making calls.

re:效率:是的,mov/call rax的效率较低.我认为,如果分支预测未命中并且无法从BTB提供目标预测,则速度要慢得多.但是,即使call rel32jmp rel32仍需要BTB才能发挥全部性能.请参阅慢速的jmp指令,以获取相对的jmp next_insn的实验结果,当相对的jmp next_insn速度变慢时,速度变慢巨大的循环.

re: efficiency: yes, mov / call rax is less efficient. I think it's significantly slower if branch prediction misses and can't provide a target prediction from the BTB. However, even call rel32 and jmp rel32 still need the BTB for full performance. See Slow jmp-instruction for experimental results from relative jmp next_insn slowing down when there are too many in a giant loop.

对于热分支预测器,间接版本仅是额外的代码大小和额外的uop(mov).它可能会消耗更多的预测资源,但甚至可能不会.

With hot branch predictors, the indirect version is only extra code size and an extra uop (the mov). It might consume more prediction resources, but maybe not even that.

另请参见分支目标缓冲区检测到哪些分支预测错误?

这篇关于如何执行带有64位绝对地址的调用指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆