为什么在Mac ABI要求对X86-32 16字节堆栈对齐? [英] Why does the Mac ABI require 16-byte stack alignment for x86-32?

查看:501
本文介绍了为什么在Mac ABI要求对X86-32 16字节堆栈对齐?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以理解这一要求的旧PPC RISC系统,甚至为x86-64的,但是对于老尝试和真正的x86?在这种情况下,需要堆栈上只有4字节边界对齐的。是的,一些MMX / SSE指令需要16byte的路线,但如果是被叫方的要求,那么就应该保证路线是正确的。为什么负担的每次来电与这些额外的要求?这实际上可能导致一些滴性能,因为每次调用站点必须管理这一要求。我失去了一些东西?

I can understand this requirement for the old PPC RISC systems and even for x86-64, but for the old tried-and-true x86? In this case, the stack needs to be aligned on 4 byte boundaries only. Yes, some of the MMX/SSE instructions require 16byte alignments, but if that is a requirement of the callee, then it should ensure the alignments are correct. Why burden every caller with this extra requirement? This can actually cause some drops in performance because every call-site must manage this requirement. Am I missing something?

更新:经过一些调查,这与一些内部同事进行磋商,我有一些理论这一点:

Update: After some more investigation into this and some consultation with some internal colleagues, I have some theories about this:

  1. 的PPC,x86和x64的操作系统版本的一致性
  2. 看来,海湾合作委员会codeGEN现在一直做了子尤,XXX,然后MOVS数据到堆栈,而不是简单地做一个推的指令。这实际上可能会更快一些硬件。
  3. 虽然这确实复杂化的调用点了一下,使用默认的CDECL惯例通常是主叫清理栈的时候很少有额外的开销。

这个问题我有最后一个项目,就是调用依赖于被调用方清理堆栈约定,上述规定的真的uglifies的codeGEN。例如,有些什么编译器决定实现更快的基于寄存器的通话风格供自己内部使用(即任何code表示不打算从其他语言或来源调用)?这个栈对齐的事情可能会抵消一些通过传递寄存器中的一些参数来实现性能提升。

The issue I have with the last item, is that for calling conventions that rely on the callee cleaning the stack, the above requirements really "uglifies" the codegen. For instance, what some compiler decided to implement a faster register-based calling style for its own internal use (ie any code that isn't intended to be called from other languages or sources)? This stack-alignment thing could negate some of the performance gains achieved by passing some parameters in registers.

更新:迄今为止唯一真正的答案一直保持一致,但对我来说这是一个有点太容易的答案。我有超过20年的经验与x86架构,如果一致性,而不是性能,还是别的什么具体的,实在的原因,然后我恭敬地认为是一个有点天真的开发者需要它。他们忽略了近三十年的工具和支持。特别是如果他们希望工具供应商能够快速,轻松地适应他们的工具为自己的平台(也许不是......它的苹果...),而无需跳转通过几个看似不必要的篮球。

Update: So far the only real answers have been consistency, but to me that's a bit too easy of an answer. I have well over 20 years experience with the x86 architecture and if consistency, not performance, or something else concrete, is really the reason then I respectfully suggest that is a bit naive for the developers to require it. They're ignoring nearly three decades of tools and support. Especially if they're expecting tools vendors to quickly and easily adapt their tools for their platform (maybe not... it is Apple...) without having to jump through several seemingly unnecessary hoops.

我给这个话题另一天左右,然后再关闭它...

I'll give this topic another day or so then close it...

推荐答案

从Intel®64和IA-32架构优化参考手册,第4.4.2节:

From "Intel®64 and IA-32 Architectures Optimization Reference Manual", section 4.4.2:

为了获得最佳的性能,流式SIMD扩展和SIMD流指令扩展2要求其内存操作数对齐到16字节边界。相比对齐数据未对齐的数据可能会导致显著的性能损失。

"For best performance, the Streaming SIMD Extensions and Streaming SIMD Extensions 2 require their memory operands to be aligned to 16-byte boundaries. Unaligned data can cause significant performance penalties compared to aligned data."

从附录D:

重要的是要确保堆栈帧在函数入口对齐16字节边界,以保持局部__m128数据,参数和整个函数调用对准XMM寄存器溢出的位置是很重要的。

"It is important to ensure that the stack frame is aligned to a 16-byte boundary upon function entry to keep local __m128 data, parameters, and XMM register spill locations aligned throughout a function invocation."

<一个href="http://www.intel.com/Assets/PDF/manual/248966.pdf">http://www.intel.com/Assets/PDF/manual/248966.pdf

这篇关于为什么在Mac ABI要求对X86-32 16字节堆栈对齐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆