x86_64的:是栈帧指针几乎没用? [英] x86_64 : is stack frame pointer almost useless?

查看:913
本文介绍了x86_64的:是栈帧指针几乎没用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  • 的Linux x86_64的。

  • GCC 5.x的

我正在研究的输出,有两个codeS,用-fomit帧指针和无(在-O3GCC默认启用该选项)。

I was studying the output of two codes, with -fomit-frame-pointer and without (gcc at "-O3" enables that option by default).

pushq    %rbp
movq     %rsp, %rbp
...
popq     %rbp

我的问题是:

如果我全局禁用该选项,即使在极端情况下,编制一个操作系统,有没有收获?

My question is :

If I globally disable that option, even for, at the extreme, compiling an operating system, is there a catch ?

我知道,中断使用该信息,所以该选项只对用户空间?

I know that interrupts use that information, so is that option good only for user space ?

推荐答案

编译器总是产生自洽code,因此禁用帧指针是罚款,只要你不使用外接/手工制作的code,它使得有关它的一些假设(例如,通过依靠 RBP 的例如值)。

The compilers always generate self consistent code, so disabling the frame pointer is fine as long as you don't use external/hand crafted code that makes some assumption about it (e.g. by relying on the value of rbp for example).

该中断不使用帧指针信息,他们可能会使用当前堆栈指针保存最小的范围内,但这是依赖于中断和OS的类型(一个硬件中断可能会使用一个环0堆栈)。结果
你可以看一下英特尔的手册以获取更多信息。

The interrupts don't use the the frame pointer information, they may use the current stack pointer for saving a minimal context but this is dependent on the type of interrupt and OS (an hardware interrupt use a Ring 0 stack probably).
You can look at Intel manuals for more information on this.

关于帧指针的用处:结果
几年前,编制了几个简单的程序,并寻找产生的64位汇编code后,我有你同样的问题。结果
如果你不介意阅读全音符我已经为自己写回去以后,他们在这里。

About the usefulness of the frame pointer:
Years ago, after compiling a couple of simply routines and looking at the generated 64 bit assembly code I had your same question.
If you don't mind reading a whole notes I have written for myself back then, here they are.

注意:询问有关的东西的用处是有点亲戚。编写汇编code当前主64位的ABI我发现我自己使用的堆栈帧较轻和较小。然而,这只是我的编码风格和意见。

Note: Asking about the usefulness of something is a little bit relative. Writing assembly code for the current main 64 bit ABIs I found my self using the stack frame lesser and lesser. However this is just my coding style and opinion.

我喜欢使用帧指针,写一个函数的序幕和尾声,但我喜欢直接的答案不舒服​​了,所以在这里我怎么看它:

I like using the frame pointer, writing the prologue and epilogue of a function, but I like direct uncomfortable answers too, so here how I see it:

是,帧指针是x86_64的几乎无用

当心它并不是完全无用的,专为人类,但一个编译器不需要它了。
为了更好地理解为什么我们首先有一个帧指针,最好是召回部分历史。

Beware it is not completely useless, specially for humans, but a compiler doesn't need it anymore. To better understand why we have a frame pointer in the first place it is better to recall some history.

在英特尔CPU只支持16位模式有关于如何访问堆栈,特别是该指令是(现在仍然是)一些限制非法

When Intel CPUs supported only "16 bit mode" there were some limitation on how to access the stack, particularly this instruction was (and still is) illegal

mov ax, WORD [sp+10h]

由于 SP 不能用来作为基址寄存器。只有极少数指定的寄存器可以用于这种目的,例如 BX 或比较著名的 BP 。结果
如今它不是一个细节大家把自己的眼睛上,但 BP 的优势在于比其他基址寄存器,它含蓄地暗示使用的SS 作为段/选择寄存器,就像 SP 一样。结果
即使你的程序是分散全部横跨内存,每个段寄存器指向不同的区域, BP SP 行动同样的,毕竟这是设计师的意图。

because sp cannot be used as a base register. Only few designated registers could be used for such purpose, for example bx or the more famous bp.
Nowadays it's not a detail everybody put their eyes on but bp has the advantage over other base register that it implicitly implicates the use of ss as a segment/selector register, just like sp does.
Even if your program were scatter all across the memory with each segment register pointing to a different area, bp and sp acted the same, after all that was the intent of the designers.

因此​​,一个堆栈帧通常是必要的,因此帧指针。结果
BP 有效地划分堆栈三个部分:参数的领域,的老BP 的区域(只是一个字),并在局部变量的区域。每个区域由被识别的偏移量来访问它:旧 BP ,阴性为局部变量正面为参数,为零。

So a stack frame was usually necessary and consequently a frame pointer.
bp effectively partitioned the stack in three parts: the arguments area, the old bp area (just a WORD) and the local variables area. Each area being identified by the offset used to access it: positive for the arguments, zero for the old bp, negative for the local variables.

由于英特尔CPU是不断发展的,增加了更多的指令是与更多的指令,加入一个更广泛的寻址方式了。结果
特别是可能使用任何寄存器作为基址寄存器,这包括使用尤其。结果
这样作为说明

As the Intel CPUs were evolving, more instructions were added and with more instructions were added a more extensive addressing mode too.
Specifically the possibility to use any register as a base register, this include the use of esp.
Being instructions like this

mov eax, DWORD [esp+10h]

现在有效,使用堆栈帧与帧指针似乎注定要结束了。结果
可能不是这种情况下,至少在开始。结果,
这是事实,现在是不可能完全使用尤其,但在上述三区的叠层的分离仍然是有用的,特别是对人类。

now valid, the use of the stack frame and the frame pointer seems doomed to an end.
Likely this was not the case, at least in the beginnings.
It is true that now it is possible to use entirely esp but the separation of the stack in the said three area is still useful, specially for humans.

如果没有帧指针推或弹出会改变相对于尤其偏移参数或局部变量,给人的形式code,看起来不直观的第一视线,考虑如何与cdecl调用约定执行下面的C程序:

Without the frame pointer a push or a pop would change an argument or local variable offset relative to esp, giving form to code that look non intuitive at first sight, consider how to implement the following C routine with cdecl calling convention:

void my_routine(int a, int b)
{  
    return my_add(a, b); 
}

不具有和具有一个framestack

without and with a framestack

my_routine:      
  push DWORD [esp+08h]
  push DWORD [esp+08h]
  call my_add
  ret

my_routine:
  push ebp
  mov ebp, esp

  push DWORD [ebp+10h]
  push DWORD [ebp+08h]
  call my_add

  pop ebp
  ret 

乍一看似乎是第一个版本推向同一个值的两倍,它添加本地变量(特别是其中很多)比情况变得很快难以阅读: MOV EAX,[ESP + 0cah ] 指的是一个局部变量或参数?结果
随着堆栈帧我们有固定的参数和本地变量的偏移量。

At first sight seems that the first version push the same value twice, it you add local vars (specially lots of them) than the situation becomes quickly hard to read: mov eax, [esp+0cah] refers to a local var or to an argument?
With a stack frame we have fixed offsets for the arguments and local vars.

即使在第一编译器仍然prefered通过使用堆栈指针给出的固定偏移量。我看到这种行为首先更改与海湾合作委员会。结果
在调试有效地构建栈帧添加清晰度到code,使之易于为(精通)程序员遵循是怎么回事,并在注释中指出让堆栈调用更简单的恢复。

然而,现代的编译器都擅长数学,可以很容易保持堆栈指针移动的计数,并产生相应的偏移省略堆栈帧以加快执行速度。

Even the compilers at first still prefered the fixed offsets given by the use of the stack pointer. I see this behavior changing first with gcc.
In a debug build the stack frame effectively add clarity to the code and make it easy for the (proficient) programmer to follow what is going on and as pointed out in the comment let the recover of the stack call easier.
The modern compiler however are good at math and can easily keep the count of the stack pointer movements and generate the appropriate offset omitting the stack frame for faster execution.

直到引入SSE指令的英特尔处理器从来没有要求从相比,其RISC兄弟程序员了。结果
尤其是,他们从来没有要求数据对齐,我们可以访问的3个地址多个32位数据没有大的抱怨(这取决于DRAM的数据宽度,这可能导致对延迟增加)。结果
证用于需要被上16字节边界访问的16个字节的操作数,作为SIMD模式变为在硬件有效地实现,并且变得更流行的16字节边界的对准就变得很重要。

Until the introduction of SSE instructions the Intel processors never asked much from the programmers compared to their RISC brothers.
In particular they never asked for data alignment, we could access 32 bit data on address multiple of 3 with no major complain (depending on the DRAM data width, this may result on increased latency).
SSE used 16 bytes operand that needed to be accessed on 16 byte boundary, as the SIMD paradigm becomes implemented efficiently in the hardware and becomes more popular the alignment on 16 byte boundary becomes important.

主要的64位的ABI现在需要它时,堆栈必须在段落对齐。结果
现在,我们通常称之为使得序幕后堆栈对齐,但假设我们不是祝福与保证,我们需要做的这一块

The main 64 bit ABIs now require it, the stack must be aligned on paragraphs.
Now, we are usually called such that after the prologue the stack is aligned, but suppose we are not blessed with that guarantee, we would need to do one of this

push rbp                   push rbp
mov rbp, rsp               mov rbp, rsp             

and spl, 0f0h              sub rsp, xxx
sub rsp, 10h*k             and spl, 0f0h

这种或那种方式堆栈这些序言后一致,但我们不能再使用负距离 RBP 偏移访问需要对齐本地变量,因为框架指针本身它没有对齐。结果
我们需要使用 RSP ,我们可以安排,有一个开场白 RBP 在当地一个对准区域的顶部指向瓦尔但随后的论点将处于未知的偏移。结果
我们可以安排一个复杂的堆栈帧(也许更多的一个指针),但老式的堆栈指针的关键是它的简单性。

One way or another the stack is aligned after these prologues, however we can no longer use a negative offset from rbp to access local vars that need alignment, because the frame pointer itself it not aligned.
We need to use rsp, we could arrange a prologue that has rbp pointing at the top of an aligned area of local vars but then the arguments would be at unknown offset.
We can arrange a complex stack frame (maybe with more the one pointer) but the key of the old fashioned stack pointer was it simplicity.

因此​​,我们可以使用帧指针访问的堆栈,为本地变量堆栈指针,很公平的论点。结果
唉堆栈用于使已经减少的参数和用于小数量的参数(目前,有四名)它甚至没有使用,在未来它可能会较少使用的

So we can use the frame pointer to access the arguments on the stack and the stack pointer for the local vars, fair enough.
Alas the role of stack for arguments passing has been reduced and for small number of arguments (currently four) it is not even used and in the future it will probably be of lesser use.

所以我们不使用本地变量(大部分)帧指针,也没有为参数(大部分),对我们如何使用它?

So we don't use the frame pointer for local vars (mostly), nor for the arguments (mostly), for what do we use it?


  1. 它保存原始 RSP ,所以要恢复堆栈指针的副本在函数退出,一个 MOV 就足够了。如果协议栈与对齐的,这是不可逆的,正本是必要的。

  1. It save a copy of the original rsp, so to restore the stack pointer at function exit, a mov is enough. If the stack is aligned with an and, which is not invertible, an original copy is necessary.

其实有些ABI保证标准的开场白后,堆栈对齐,从而允许我们使用帧指针像往常一样。

Actually some ABI guarantee that after the standard prologue the stack is aligned thereby allowing us to use the frame pointer as usual.

有些变量不需要对齐,可以与未对齐帧指针进行访问,这是事实通常是手工制作的code。

Some variable don't need alignment and can be accessed with an unaligned frame pointer, this is true usually for hand crafted code.

有些函数需要四个以上参数。

Some function requires more than four parameters.

帧指针从16位程序的残留范例,但已经证明了自己仍然是因为它的简单和清晰的32位机器有用的访问本地变量和参数时。结果
在64位机器,不过严格要求消失大多数这些简单明了,帧指针停留在不过调试模式。

Summary

The frame pointer is a vestigial paradigm from 16 bit programs but that has proven itself still useful on 32 bit machines because of its simplicity and clarity when accessing local vars and arguments.
On 64 bit machines however the strict requirements vanish most of these simplicity and clarity, the frame pointer remains in debug mode however.

在一个事实,即帧指针可以用来做有趣的事情:这是真的我想,我从来没有见过这样的code,但我可以像它将如何工作的结果。
但是,我专注于帧指针,因为这的看家角色是我一直看到它的方式。结果
所有疯狂的事情可以设置为帧指针的值相同的指针来完成,我给后者更特殊的角色。结果
VS2013例如有时会使用 RDI 是帧指针,但我不认为这是一个真正的帧指针,如果它不使用 RBP / EBP / BP 。结果
对我来说,使用 RDI 表示帧指针省略优化:)

On the fact that the frame pointer can be used to make fun things: it is true I guess, I've never seen such code but I can image how it would work.
I, however, focused on the housekeeping role of the frame pointer as this is the way I always have seen it.
All the crazy things can be done with any pointer set to the same value of the frame pointer, I give the latter a more "special" role.
VS2013 for example sometimes use rdi as "frame pointer", but I don't consider it a real frame pointer if it doesn't use rbp/ebp/bp.
To me the use of rdi means a Frame Pointer Omission optimization :)

这篇关于x86_64的:是栈帧指针几乎没用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆