了解快速调用堆栈帧 [英] Understanding fastcall stack frame

查看:22
本文介绍了了解快速调用堆栈帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将我的算法移植到 ml64 的汇编中,一半用于运动,一半用于查看我实际上可以获得多少性能.

无论如何,目前我正在尝试了解堆栈帧设置,在这个例子中,据我所知:

push rbp ;继承,调用者的基指针,压入堆栈存储移动 rbp, rsp ;继承的被调用者的基指针,移到 rbp 用作基指针子 rsp, 32 ;英特尔指南说每帧必须保留 32 个字节用于存储;4 个参数通常通过寄存器传递和 spl, -16 ;16 字节对齐?mov rsp, rbp ;将您的基指针放回被调用者寄存器中流行音乐 ;恢复调用者基指针

我没有得到的两件事是

  1. 从 RSP 中减去 32 有什么作用?据我所知,除了从一个堆栈帧到另一个堆栈帧的职责之外,它只是另一个寄存器,对吗?我怀疑它进入另一个堆栈帧而不是在当前堆栈帧中使用.

  2. 什么是 SPL,为什么屏蔽它会使某些东西 16 字节对齐?

解决方案

push rbp ;save non-volatile rbpmov rbp, rsp ;保存旧堆栈sub rsp, 32 ; 为 32 字节的局部变量保留空间 = 8 个整数;或4个指针.;这是根据 MS/Intel 指南.您可以将其用作临时;存储参数或局部变量.和 spl, -16 ;将堆栈对齐 16 个字节(用于 sse 代码)mov rsp, rbp ;恢复旧堆栈pop rbp ;恢复rbp

<块引用><块引用>

从 RSP 中减去 32 有什么作用

RSP 是堆栈指针,而不是只是另一个寄存器.对它做任何事情都会影响堆栈.在这种情况下,它在堆栈上保留了 8x4 = 32 字节的空间用于放置局部变量.

<块引用>

什么是 SPL,为什么屏蔽它会使某些 16 字节对齐?

和 rsp,-16 强制四个 LSB 为零.并且由于堆栈向下增长,因此将其对齐 16 个字节.
使用 SSE 代码时需要 16 字节的对齐,x64 用于浮点数学.16 字节对齐允许编译器使用更快对齐的 SSE 加载和存储指令.
SPLRSP 的低 8 位.为什么编译器选择这样做是没有意义的.两条指令都是 4 个字节,和 rsp,-16 更好,因为它不调用部分寄存器更新.

反汇编:0: 40 80 e4 f0 和 spl,-16 ;糟糕!部分寄存器更新.4: 48 83 e4 f0 和 rsp,-16 ;好8: 83 e4 f0 和 esp,-16 ;不可能将 rsp 的高 32 位归零

<块引用>

[RSP 是] 只是另一个寄存器,对吗?

不,RSP 很特别.
它指向,这是PUSHPOP 指令执行.
所有局部变量和参数(不适合寄存器)都存储在堆栈中.

<块引用>

了解快速呼叫

X64 中只有一种调用约定.如果您指定了除 __fastcall 之外的调用约定,大多数编译器会将其重新映射到 X64 上的 __fastcall ,这会让事情变得更加混乱.

I'm porting an algorithm of mine to assembly for ml64, half for sport, half to see how much performance I can actually gain.

Anyways, currently I'm trying to understand the stack frame setup, in this example as far as I know:

push rbp        ; inherited, base pointer of caller, pushed on stack for storage
mov rbp, rsp    ; inherited, base pointer of the callee, moved to rbp for use as base pointer
sub rsp, 32     ; intel guide says each frame must reserve 32 bytes for the storage of the
                ; 4 arguments usually passed through registers
and spl, -16    ; 16 byte alignment?


mov rsp, rbp    ; put your base pointer back in the callee register
pop rbp         ; restore callers base pointer

The 2 things that I'm not getting is

  1. How does subtracting 32 from RSP do anything at all? As far as I know, other than for its duties going from one stack frame to another, its just another register, right? I suspect its for going into another stack frame rather than for use in the current one.

  2. What is SPL and why does masking it make something 16 byte aligned?

解决方案

push rbp        ;save non-volatile rbp
mov rbp, rsp    ;save old stack
sub rsp, 32     ;reserve space for 32 bytes of local variables = 8 integers
                ;or 4 pointers.
                ;this is per the MS/Intel guides. You can use this as temp
                ;storage for the parameters or for local variables.
and spl, -16    ;align stack by 16 bytes (for sse code)


mov rsp, rbp    ;restore the old stack
pop rbp         ;restore rbp

How does subtracting 32 from RSP do anything at all

RSP is the stack pointer, not just another register. Doing anything to it affects the stack. In this case it reserves 8x4 = 32 bytes of space on the stack for local variables to be placed in.

What is SPL and why does masking it make something 16 byte aligned?

The and rsp,-16 forces the four LSB's to zero. And because the stack grows down this aligns it by 16 bytes.
The alignment by 16 bytes is needed when using SSE code, which x64 uses for floating point math. Having 16 byte alignment allows the compiler to use the faster aligned SSE load and store instructions.
SPL is the lower 8 bits of RSP. Why the compiler chooses to do this makes no sense. Both instructions are 4 bytes and and rsp,-16 is strictly better, because it does not invoke partial register updates.

Disassembly:

0:  40 80 e4 f0       and    spl,-16   ;bad! partial register update.
4:  48 83 e4 f0       and    rsp,-16   ;good
8:  83 e4 f0          and    esp,-16   ;not possible will zero upper 32 bits of rsp

[RSP is] just another register, right?

No, RSP is magically special.
It points to the stack, which is where PUSH and POP instructions act upon.
All local variables and parameters (which do not fit into the registers) are stored in the stack.

Understanding fastcall

There is only one calling convention in X64. To make matters more confusing if you specify a calling convention other than __fastcall most compiler will remap it to __fastcall on X64.

这篇关于了解快速调用堆栈帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆