MASM x64中的跳转表实现? [英] Jump table implementation in MASM x64?

查看:92
本文介绍了MASM x64中的跳转表实现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用跳转表在程序集(MASM64,Windows,x64)中实现算法.基本思想是:我需要对数据执行3种不同类型的操作.这些操作取决于一些变量,但是我发现实现许多切换和许多长的实现很乏味.

I'm trying to implement an algorithm in assembly (MASM64, Windows, x64) using jump tables. Basic idea is: there are 3 different types of operations I need to do with data. The operations depend on some variables, but I found it tedious to implement a lot of switching and many long implementations.

PUBLIC superFunc@@40 ;__vectorcall decoration
.DATA
ALIGN 16
jumpTable1 qword func_11, func_12, func_13, func_14
jumpTable2 qword func_21, func_22, func_23, func_24
jumpTable3 qword func_31, func_32, func_33, func_34

.CODE
superFunc@@40 PROC
        ;no stack actions, as we should do our stuff as a leaf function
        ;assume the first parameter (rcx) is our jumpTable index, and it's
        ;the same index for all functions
        mov     rax,    qword ptr [rcx*8 + offset jumpTable1]
        mov     r10,    qword ptr [rcx*8 + offset jumpTable2]
        mov     r11,    qword ptr [rcx*8 + offset jumpTable3]
        jmp     qword ptr [rax]
superFunc@@40 ENDP
func_11:
        [...] do something with data
        jmp     qword ptr [r10]
func_12: ; shorted, simply does something else to the data and jumps thru r10
[...]
func_21:
        [...] do something with data
        jmp     qword ptr [r11]
func_22: ; shorted, simply does something else to the data and jumps thru r11
[...]
func_31:
        [...] do something with data
        ret
func_32: ; shorted, simply does something else to the data and returns
END

现在,它可以很好地编译,但是它无法与我的主要C ++插件(DLL)链接,从而给我以下链接器错误:

Now this compiles well, but it doesn't link with my main C++ Plugin (a DLL), giving me the following linker errors:

LINK : warning LNK4075: ignoring '/LARGEADDRESSAWARE:NO' due to '/DLL' specification
error LNK2017: 'ADDR32' relocation to 'jumpTable1' invalid without /LARGEADDRESSAWARE:NO

我该如何正确实现这样的功能?也许更好的表述:如何在MASM64中正确实现跳转表以及从这些表中正确地跳转/调用地址?

How can I implement something like this correctly? Maybe better phrased: How do I implement jump tables and jumping/calling to addresses from those tables correctly in MASM64?

P.S .:我可以在C ++中设置一个函数表,并通过参数将其告知superFunc.如果找不到更好的解决方案,那将是我要做的.

P.S.: I could set up a function table in C++ and tell the superFunc about it via a parameter. That would be what I will do if I don't find a better solution.

推荐答案

RIP相对寻址仅在寻址模式中没有其他寄存器时才有效.

RIP-relative addressing only works when there are no other registers in the addressing mode.

[table + rcx*8] 只能在x86-64机器代码中编码为[disp32 + rcx*8],因此仅适用于适合32位带符号绝对地址的非大地址. Windows显然可以使用LARGEADDRESSAWARE:NO支持此功能,例如在Linux 使用-no-pie 进行编译以解决相同的问题.

[table + rcx*8] can only be encoded in x86-64 machine code as [disp32 + rcx*8], and thus only works with non-large addresses that fit in a 32-bit signed absolute address. Windows can apparently support this with LARGEADDRESSAWARE:NO, like on Linux compiling with -no-pie to solve the same problem.

MacOS尚无解决方法,您根本无法使用64位绝对寻址. Mach-O 64位格式不支持32位绝对地址. NASM访问数组显示如何使用相对RIP的lea为静态数组建立索引,以将表地址存储到寄存器中,同时避免使用32位绝对地址.

MacOS has no workaround for it, you can't use 64-bit absolute addressing at all there. Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array shows how to index a static array using a RIP-relative lea to get the table address into a register while avoiding 32-bit absolute addresses.

您的跳转表本身很好:它们使用 64位绝对地址,可以将其重新放置在虚拟地址空间中的任何位置. (在ASLR之后使用加载时间修正程序.)

Your jump tables themselves are fine: they use 64-bit absolute addresses which can be relocated anywhere in virtual address space. (Using load-time fixups after ASLR.)

我认为您的间接访问级别太多了.由于您已经将函数指针加载到寄存器中,因此应该使用jmp r10而不是jmp [r10].在所有可能的分支错误预测之前,将所有负载预先存储在寄存器中会使它们更快地进入管道,因此,如果您有许多空闲的寄存器, 可能是个好主意.

I think you have one too many levels of indirection. Since you already load a function pointer into a register, you should be using jmp r10 not jmp [r10]. Doing all the loads into registers up front gets them in the pipeline sooner, before any possible branch mispredicts, so is maybe a good idea if you have lots of registers to spare.

如果较小的话,最好内联一些后面的块,因为通过任何给定的RCX值可访问的块都无法通过其他方式访问.因此,最好将所有func_21func_31内联到func_11,以此类推,对于func_12.您可以使用汇编程序宏来简化此操作.

Much better would be inlining some of the later blocks, if they're small, because the blocks reachable by any given RCX value aren't reachable any other way. So it would be much better to inline all of func_21 and func_31 into func_11, and so on for func_12. You might use assembler macros to make this easier.

实际上,重要的是func_11 始终末尾的跳转转到func_21.可以采用其他方法达到目标,例如来自跳过表1的其他间接分支.如果func_21对于非func_11不能通过的执行路径仍然是有效的入口点,则仅限制了可以在这两个块之间进行的优化.

Actually what matters is just that the jump at the end of func_11 always goes to func_21. It's fine of there are other ways to reach that block, e.g. from other indirect branches that skip table 1. That's no reason for func_11 not to fall into it; it only limits what optimizations you can make between those 2 blocks if func_21 still has to be a valid entry point for execution paths that didn't fall through from func_11.

但是无论如何,您可以像这样实现您的代码.如果您确实对其进行了优化,则可以删除以后的调度步骤和相应的负载.

But anyway, you can implement your code like this. If you do optimize it, you can remove the later dispatching steps and the corresponding loads.

我认为这是有效的MASM语法.如果没有,应该仍然清楚所需的机器代码是什么.

I think this is valid MASM syntax. If not, it should still be clear what the desired machine-code is.

    lea    rax,  [jumpTable1]          ; RIP-relative by default in MASM, like GAS [RIP + jumpTable1] or NASM [rel jumpTable1]

    ; The other tables are at assemble-time-constant small offsets from RAX
    mov    r10,  [rax + rcx*8 + jumpTable3 - jumpTable1]
    mov    r11,  [rax + rcx*8 + jumpTable2 - jumpTable1]
    jmp    [rax + rcx*8]


func_11:
    ...
    jmp  r10         ; TODO: inline func_21  or at least use  jmp func_21
                     ;  you can use macros to help with either of those

或者,如果您只想为一个表绑定一个寄存器,则可以使用:

Or if you only want to tie up a single register for one table, maybe use:

    lea    r10,  [jumpTable1]    ; RIP-relative LEA
    lea    r10,  [r10 + rcx*8]   ; address of the function pointer we want
    jmp    [r10]

align 8
func_11:
    ...
    jmp   [r10 + jumpTable2 - jumpTable1]    ; same index in another table


align 8
func_12:
    ...
    jmp   [r10 + jumpTable3 - jumpTable1]    ; same index in *another* table

这充分利用了表之间的已知静态偏移量.

This takes full advantage of the known static offsets between tables.

跳转目标的缓存位置

在跳转目标矩阵中,任何单个用法都会沿列"向下移动以遵循某些跳转链.显然,最好对布局进行转置,以使跳转链沿着行"行,这样所有目标都来自同一缓存行.

In your matrix of jump targets, any single usage strides down a "column" to follow some chain of jumps. It would obviously be better to transpose your layout so that one chain of jumps goes along a "row", so so all the targets come from the same cache line.

即排列表格,使func_1121可以以 jmp [r10+8]结尾,然后以jmp [r10+16] 结尾,而不是在表之间添加一些偏移量,以提高空间局部性. L1d加载延迟仅为几个周期,因此与检查是否在第一个间接分支之前加载到寄存器相比,CPU在检查分支预测的正确性方面没有太多额外的延迟. (我正在考虑第一个分支预测错误的情况,因此OoO exec直到发出正确的路径后才能看到"间接内存的jmp.)

i.e. arrange your table so func_11 and 21 can end with jmp [r10+8], and then jmp [r10+16], instead of + some offset between tables, for improved spatial locality. L1d load latency is only a few cycles so there's not much extra delay for the CPU in check the correctness of branch prediction, vs. if you'd loaded into registers ahead of the first indirect branch. (I'm considering the case where the first branch mispredicts, so OoO exec can't "see" the memory-indirect jmp until after the correct path for that starts to issue.)

您还可以存储相对于跳转目标附近的某个参考地址或表本身的32位(或16或8位)偏移量.

You can also store 32-bit (or 16 or 8-bit) offsets relative to some reference address that's near the jump targets, or relative to the table itself.

例如,查看一下GCC在与位置无关的代码中编译switch跳转表时所做的事情,即使对于确实允许运行时修复绝对地址的目标,也是如此.

For example, look at what GCC does when compiling switch jump tables in position-independent code, even for targets that do allow runtime fixups of absolute addresses.

https://gcc.gnu.org/bugzilla/show_bug.cgi ?id = 84011 包含一个测试用例;请在 Godbolt中使用GCC的MASM样式.intel_syntax 进行查看.它使用表中的movsxd负载,然后使用add rax, rdx/jmp rax.表条目是dd L27 - L4dd L25 - L4之类的内容(其中的是标签名称,给出了从跳转目标到锚点" L4的距离).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 includes a testcase; see it on Godbolt with GCC's MASM-style .intel_syntax. It uses a movsxd load from the table, then add rax, rdx / jmp rax. The table entries are things like dd L27 - L4 and dd L25 - L4 (where those are label names, giving the distance from a jump target to the "anchor" L4).

(与此情况相关 https://gcc.gnu.org /bugzilla/show_bug.cgi?id=85585 ).

这篇关于MASM x64中的跳转表实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆