为什么使用 ebp 比使用 esp 寄存器更好地定位堆栈上的参数? [英] Why is it better to use the ebp than the esp register to locate parameters on the stack?

查看:23
本文介绍了为什么使用 ebp 比使用 esp 寄存器更好地定位堆栈上的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 MASM 的新手.我对这些指针寄存器感到困惑.如果你们能帮助我,我将不胜感激.

I am new to MASM. I have confusion regarding these pointer registers. I would really appreciate if you guys help me.

谢谢

推荐答案

使用 [ebp + disp8] 编码寻址模式比 [esp+disp8] 短一个字节>,因为使用 ESP 作为基址寄存器需要一个 SIB 字节.有关详细信息,请参阅rbp 不允许作为 SIB 基础?.(该问题的标题是关于 [ebp] 必须被编码为 [ebp+0] 的事实.)

Encoding an addressing mode using [ebp + disp8] is one byte shorter than [esp+disp8], because using ESP as a base register requires a SIB byte. See rbp not allowed as SIB base? for details. (That question title is asking about the fact that [ebp] has to be encoded as [ebp+0].)

第一次使用 [esp + disp8] 在 push 或 pop 之后,或者在 call 之后,需要在 Intel CPU 上进行堆栈同步 uop.(Sandybridge 微架构中的堆栈引擎是什么?).当然, mov ebp, esp 首先制作堆栈帧也会触发堆栈同步 uop:在乱序内核中对 ESP 的任何显式引用(不仅仅是寻址模式)如果堆栈引擎可能具有乱序后端不知道的偏移量,则会导致堆栈同步 uop.

The first time [esp + disp8] is used after a push or pop, or after a call, will require a stack-sync uop on Intel CPUs. (What is the stack engine in the Sandybridge microarchitecture?). Of course, mov ebp, esp to make a stack frame in the first place also triggers a stack-sync uop: any explicit reference to ESP in the out-of-order core (not just addressing modes) cause a stack-sync uop if the stack engine might have an offset that the out-of-order back end doesn't know about.

使用 ebp 的传统堆栈帧设置创建了一个堆栈帧链表(每个保存的 EBP 指向父级保存的 EBP,就在返回地址的正下方),便于分析和有时调试如果您的代码没有可让您的调试器展开堆栈以显示堆栈回溯的备用元数据.

The traditional stack-frame setup with ebp creates a linked-list of stack frames (each saved EBP pointing at the parent's saved EBP, right below a return address), handy for profiling and sometimes debugging if your code doesn't have alternate metadata that lets your debugger unwind the stack to show stack backtraces.

但是,尽管使用 ESP 有这些缺点,但使用 EBP 作为帧指针通常不是更好(为了性能),因为它使用了 8 个 GP 寄存器中的一个额外的堆栈,为您留下 6 而不是 7,您实际上可以将其用于堆栈以外的东西. 启用优化时,现代编译器默认为 -fomit-frame-pointer.

But despite these downsides to using ESP, it's often not better (for performance) to use EBP as a frame pointer, because it uses up an extra one of the 8 GP registers for the stack, leaving you with 6 instead of 7 you can actually use for stuff other than the stack. Modern compilers default to -fomit-frame-pointer when optimization is enabled.

编译器很容易跟踪 ESP 相对于他们存储内容的位置移动了多少,因为他们知道 sub esp,28 移动了多少堆栈指针.即使在 push 函数 arg 之后,他们仍然知道正确的 ESP 相对偏移量到他们之前在函数中存储在堆栈中的任何内容.

It's easy for compilers to keep track of how much ESP has moved relative to where they stored something because they know how much sub esp,28 moves the stack pointer. Even after pushing a function arg, they still know the right ESP-relative offset to anything they stored on the stack earlier in the function.

人类也可以这样做,但是当您修改函数以保留一些额外空间并忘记更新从 ESP 到本地变量和堆栈参数(如果有)的所有偏移量时,很容易出错.(通常情况下,手写无法将大部分变量保存在寄存器中的大型函数是不值得的.把它留给编译器,只花时间在 asm 中编写热循环,如果有的话.)

Humans can do it, too, but it's easy to make a mistake when you modify the function to reserve some extra space and forget to update all the offsets from ESP to your locals and stack args, if any. (Normally it's not worth hand-writing large functions that can't keep most of their variables in registers, though. Leave that to the compiler and only spend your time writing the hot loops in asm, if at all.)

例外情况是,如果您的函数分配了可变数量的堆栈空间(例如 C alloca 或 C99 可变长度数组,例如 int arr[n]);在这种情况下,编译器将使用 EBP 创建一个传统的堆栈框架.或者在手写的 asm 中,如果您在循环中push 以使用调用堆栈作为 Stack 数据结构.

The exception is if your function allocates a variable amount of stack space (like C alloca or C99 variable length arrays like int arr[n]); in that case compilers will make a traditional stack frame with EBP. Or in hand-written asm, if you push in a loop to use the call stack as a Stack data structure.

例如,x86 MSVC 19.14 编译这个 C

int foo() {
    volatile int i = 0;  // force it to be stored to memory
    return i;
}

进入这个 MASM asm.(自己在 Godbolt 编译器浏览器上查看)

Into this MASM asm. (See it yourself on the Godbolt compiler explorer)

;;; MSVC -O2
_i$ = -4                                                ; size = 4
int foo(void) PROC                                        ; foo, COMDAT
        push    ecx
        mov     DWORD PTR _i$[esp+4], 0           ; note this is actually [esp+0] ; _i$ = -4
        mov     eax, DWORD PTR _i$[esp+4]
        pop     ecx
        ret     0
int foo(void) ENDP                                        ; foo

请注意,它为 i 保留了空间,使用 push 而不是 sub esp, 4 因为这样可以节省代码大小并且通常大约相同的性能.它与前端的 uops 数量相同,没有额外的堆栈同步 uops,因为 push 在对 esp 的任何显式引用之前,而 pop 在最后一个之后.

Notice that it reserves space for i with a push instead of sub esp, 4 because that saves code-size and is usually about the same performance. It's the same number of uops for the front-end, with no extra stack-sync uops, because the push is before any explicit reference to esp, and the pop is after the last one.

(如果保留超过 4 个字节,我认为它只会使用普通的 sub esp, 8 或其他.)

(If it was reserving more than 4 bytes, I think it would just use a normal sub esp, 8 or whatever.)

这里有一个明显的优化遗漏;push 0 将存储它实际想要的值,而不是 ECX 中的任何垃圾.(C/C++ 编译器可以使用什么push pop 指令来创建局部变量,而不是仅仅增加一次 esp?).而 pop eax 会清理堆栈 加载 i 作为返回值.

There's an obvious missed optimization here; push 0 would store the value it actually wants, instead of whatever garbage was in ECX. (What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?). And pop eax would clean the stack and load i as the return value.

对比 请注意,_i$ = -4 与堆栈帧"的偏移量相同,但优化后的代码使用了 esp+4作为基础,而这使用 ebp.这主要只是 MSVC 内部的一个有趣事实,它似乎在考虑 EBP 如果没有优化掉帧指针创建的位置.选择一个参考点是有道理的,与它的启用帧指针的选择对齐是显而易见的选择.

vs. this with optimization disabled. Notice that _i$ = -4 is the same offset from the "stack frame", but that the optimized code used esp+4 as the base while this uses ebp. That's mostly just a fun-fact of MSVC internals, that it seems to think in terms of where EBP would be if it hadn't optimized away frame-pointer creation. Picking a reference point makes sense, and lining up with it's frame-pointer-enabled choice is the obvious choice.

;;; MSVC -O0
_i$ = -4                                                ; size = 4
int foo(void) PROC                                        ; foo
        push    ebp
        mov     ebp, esp                     ; make a stack frame
        push    ecx
        mov     DWORD PTR _i$[ebp], 0
        mov     eax, DWORD PTR _i$[ebp]
        mov     esp, ebp
        pop     ebp
        ret     0
int foo(void) ENDP                                        ; foo

有趣的是,它仍然使用 push/pop 来保留 4 个字节的堆栈空间.这次它确实在 Intel CPU 上引起了一个额外的堆栈同步 uop,因为 mov ebp,esp 之后的 push ecxmov 之前重新弄脏了堆栈引擎esp,ebp.但这很微不足道.

Interesting, it still uses push/pop to reserve 4 bytes of stack space. This time it does cause one extra stack-sync uop on Intel CPUs, because the push ecx after the mov ebp,esp re-dirties the stack engine before mov esp, ebp. But that's pretty trivial.

这篇关于为什么使用 ebp 比使用 esp 寄存器更好地定位堆栈上的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆