用内在的x86/x64 msvc替换内联程序尾调用函数结尾 [英] replace inline assembly tailcall function epilogue with Intrinsics for x86/x64 msvc

查看:105
本文介绍了用内在的x86/x64 msvc替换内联程序尾调用函数结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我参加了一个不活动的项目,并且已经对其进行了很多修复,但是我无法正确替换Intrinsics来使用内联程序集,而x86/x64 msvc编译器不再支持该内联程序集.

I took an inactive project and already fixed a lot in it, but I don't get an Intrinsics replacement correctly to work for the used inline assembly, which is no longer supported in the x86/x64 msvc compilers.

#define XCALL(uAddr)  \
__asm { mov esp, ebp }   \
__asm { pop ebp }        \
__asm { mov eax, uAddr } \
__asm { jmp eax }

用例:

static oCMOB * CreateNewInstance() {
    XCALL(0x00718590);
}

int Copy(class zSTRING const &, enum zTSTR_KIND const &) {
    XCALL(0x0046C2D0);
}

void TrimLeft(char) {
    XCALL(0x0046C630);
}

推荐答案

此代码段位于函数的底部(该函数不能内联,并且必须使用ebp作为框架指针进行编译,并且没有其他寄存器需要恢复).它看起来很脆弱,否则仅在根本不需要内联汇编的情况下才有用.

This snippet goes at the bottom of a function (which can't inline, and must be compiled with ebp as a frame pointer, and no other registers that need restoring). It looks quite brittle, or else it's only useful in cases where you didn't need inline asm at all.

它没有返回,而是跳转到uAddr,这等效于进行尾叫.

Instead of returning, it jumps to uAddr, which is equivalent to making a tailcall.

没有用于任意跳转或操纵堆栈的内在函数.如果需要的话,您就不走运了.仅凭足够的上下文来了解如何使用此片段本身就没有意义.即哪个返回地址在堆栈上很重要,还是可以编译为调用/返回而不是jmp到该地址? (有关将其用作函数指针的简单示例,请参见此答案的第一个版本.)

There aren't intrinsics for arbitrary jumps or manipulation of the stack. If you need that, you're out of luck. It doesn't make sense to ask about this snippet by itself, only with enough context to see how it's being used. i.e. is it important which return address is on the stack, or is it ok for it to compile to call/ret instead of jmp to that address? (See the first version of this answer for a simple example of using it as a function pointer.)

从更新开始,用例只是为绝对函数指针包装的一种非常笨拙的方法.

我们可以定义正确类型的static const函数指针,因此不需要包装器,并且编译器可以在任何使用它们的地方直接调用. static const是我们让编译器知道它可以完全内联函数指针的方式,并且不需要将其作为数据存储在任何地方,就像普通的static const int xyz = 2;

We can instead define static const function pointers of the right types, so no wrapper is needed and the compiler can call directly from wherever you use these. static const is how we let the compile know it can fully inline the function pointers, and doesn't need to store them anywhere as data if it doesn't want to, just like normal static const int xyz = 2;

struct oCMOB;
class zSTRING;
enum zTSTR_KIND { a, b, c };  // enum forward declarations are illegal

// C syntax
//static oCMOB* (*const CreateNewInstance)() = (oCMOB *(*const)())0x00718590;

// C++11
static const auto CreateNewInstance = reinterpret_cast<oCMOB *(*)()>(0x00718590);
// passing an enum by const-reference is dumb.  By value is more efficient for integer types
static const auto Copy = reinterpret_cast<int (*)(class zSTRING const &, enum zTSTR_KIND const &)>(0x0046C2D0);
static const auto TrimLeft = reinterpret_cast<void (*)(char)> (0x0046C630);

void foo() {
    oCMOB *inst = CreateNewInstance();
    (void)inst; // silence unused warning

    zSTRING *dummy = nullptr;  // work around instantiating an incomplete type
    int result = Copy(*dummy, c);
    (void) result;

    TrimLeft('a');
}

使用x86-64和32位x86 MSVC以及gcc/clang 32和64位

It also compiles just fine with x86-64 and 32-bit x86 MSVC, and gcc/clang 32 and 64-bit on the Godbolt compiler explorer. (And also non-x86 architectures). This is the 32-bit asm output from MSVC, so you could compare with what you get for your nasty wrapper functions. You can see that it's basically inlined the useful part (mov eax, uAddr / jmp or call) into the caller.

;; x86 MSVC -O3
$T1 = -4                                                ; size = 4
?foo@@YAXXZ PROC                                        ; foo
        push    ecx
        mov     eax, 7439760                          ; 00718590H
        call    eax

        lea     eax, DWORD PTR $T1[esp+4]
        mov     DWORD PTR $T1[esp+4], 2       ; the by-reference enum
        push    eax
        push    0                             ; the dummy nullptr
        mov     eax, 4637392                          ; 0046c2d0H
        call    eax

        push    97                                  ; 00000061H
        mov     eax, 4638256                          ; 0046c630H
        call    eax

        add     esp, 16                             ; 00000010H
        ret     0
?foo@@YAXXZ ENDP

对于重复调用同一函数,编译器会将函数指针保留在保留调用的寄存器中.

For repeated calls to the same function, the compiler would keep the function pointer in a call-preserved register.

由于某种原因,即使使用32位依赖于的代码,我们也无法直接获得call rel32.链接器可以在链接时计算从调用位置到绝对目标的相对偏移,因此编译器没有理由使用寄存器间接call.

For some reason even with 32-bit position-dependent code, we don't get a direct call rel32. The linker can calculate the relative offset from the call-site to the absolute target at link time, so there's no reason for the compiler to use a register-indirect call.

如果我们不告诉编译器创建与位置无关的代码,则在这种情况下,这是一个有用的优化,用于针对跳转/调用寻址相对于代码的绝对地址.

If we didn't tell the compiler to create position-independent code, it's a useful optimization in this case to address absolute addresses relative to the code, for jumps/calls.

在32位代码中,每个可能的目标地址都在每个可能的源地址的范围内,但是在64位中则更困难. 在32位模式下,clang会发现此优化!但是即使在32位模式下,MSVC和gcc也会错过它.

In 32-bit code, every possible destination address is in range from every possible source address, but in 64-bit it's harder. In 32-bit mode, clang does spot this optimization! But even in 32-bit mode, MSVC and gcc miss it.

我用gcc/clang玩了一些东西:

I played around with some stuff with gcc/clang:

// don't use
oCMOB * CreateNewInstance(void) asm("0x00718590");

种类繁多,但仅作为一种整体. Gcc只是使用该字符串,就好像它是一个符号一样,因此它将call 0x00718590馈送到正确处理它的汇编程序(生成绝对重定位,可以在非PIE可执行文件中很好地链接).但是使用-fPIE时,它会发出0x00718590@GOTPCREL作为符号名称,因此我们被搞砸了.

Kind of works, but only as a total hack. Gcc just uses that string as if it were a symbol, so it feeds call 0x00718590 to the assembler, which handles it correctly (generating an absolute relocation which links just fine in a non-PIE executable). But with -fPIE, we it emits 0x00718590@GOTPCREL as a symbol name, so we're screwed.

当然,在64位模式下,PIE可执行文件或库将超出该绝对地址的范围,因此无论如何,只有非PIE才有意义.

Of course, in 64-bit mode a PIE executable or library will be out of range of that absolute address so only non-PIE makes sense anyway.

另一个想法是在asm中用绝对地址定义符号,并提供一个原型,该原型将使gcc仅在不使用@PLT或不通过GOT的情况下直接使用它. (对于func() asm("0x..."); hack,我也许也可以使用隐藏的可见性来做到这一点.)

Another idea was to define the symbol in asm with an absolute address, and provide a prototype that would get gcc to only use it directly, without @PLT or going through the GOT. (I maybe could have done that for the func() asm("0x..."); hack, too, using hidden visibility.)

我只有在使用"hidden"属性对其进行修改后才意识到,这在与位置无关的代码中是没有用的,因此无论如何您都不能在共享库或PIE可执行文件中使用它.

I only realized after hacking this up with the "hidden" attribute that this is useless in position-independent code, so you can't use this in a shared library or PIE executable anyway.

extern "C"不是必需的,但是这意味着我不必在嵌入式asm中弄乱名称.

extern "C" is not necessary, but means I didn't have to mess with name mangling in the inline asm.

#ifdef __GNUC__

extern "C" {
    // hidden visibility means that even in a PIE executable, or shared lib,
    // calls will go *directly* to that address, not via the PLT or GOT.
    oCMOB * CNI(void) __attribute__((__visibility__("hidden")));
}
//asm("CNI = 0x718590");  // set the address of a symbol, like `org 0x71... / CNI:`
asm(".set CNI, 0x718590");  // alternate syntax for the same thing


void *test() {
    CNI();    // works

    return (void*)CNI;  // gcc: RIP+0x718590 instead of the relative displacement needed to reach it?
    // clang appears to work
}
#endif

disassembly of compiled+linked gcc output for test, from Godbolt, using the binary output to see how it assembled+linked:

 # gcc -O3  (non-PIE).  Clang makes pretty much the same code, with a direct call and mov imm.
 sub    rsp,0x8
 call   718590 <CNI>
 mov    eax,0x718590
 add    rsp,0x8
 ret    

使用-fPIE时,gcc + gas发出lea rax,[rip+0x718590] # b18ab0 <CNI+0x400520>,即它将绝对地址用作RIP的偏移量,而不是减去.我猜这是因为gcc实际上会发出lea CNI(%rip),%rax,并且我们已将CNI定义为具有该数值的汇编时间符号.哎呀.因此,它不太像带有.org 0x718590; CNI:的那样带有该地址的标签.

With -fPIE, gcc+gas emits lea rax,[rip+0x718590] # b18ab0 <CNI+0x400520>, i.e. it uses the absolute address as an offset from RIP, instead of subtracting. I guess that's because gcc literally emits lea CNI(%rip),%rax, and we've defined CNI as an assemble-time symbol with that numeric value. Oops. So it's not quite like a label with that address like you'd get with .org 0x718590; CNI:.

但是,由于我们只能在非PIE可执行文件中使用rel32 call,所以可以这样做,除非您使用-no-pie进行编译,而忘记了-fno-pie,在这种情况下您会被搞砸了. :/

But since we can only use rel32 call in non-PIE executables, this is ok unless you compile with -no-pie but forget -fno-pie, in which case you're screwed. :/

使用符号定义提供单独的目标文件可能会起作用.

Providing a separate object file with the symbol definition might have worked.

即使使用-fPIE及其内置的汇编程序,Clang似乎也可以完全满足我们的要求.该机器代码只能与-fno-pie(Godbolt上的默认设置,而不是许多发行版上的默认设置)链接.

Clang appears to do exactly what we want, though, even with -fPIE, with its built-in assembler. This machine code could only have linked with -fno-pie (the default on Godbolt, not the default on many distros.)

 # disassembly of clang -fPIE machine-code output for test()
 push   rax
 call   718590 <CNI>
 lea    rax,[rip+0x3180b3]        # 718590 <CNI>
 pop    rcx
 ret    

因此,这实际上是安全的(但次优,因为lea rel32mov imm32差.)对于-m32 -fPIE,它甚至不能组装.

So this is actually safe (but sub-optimal because lea rel32 is worse than mov imm32.) With -m32 -fPIE, it doesn't even assemble.

这篇关于用内在的x86/x64 msvc替换内联程序尾调用函数结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆