为什么在 x86_64 ABI 中选择地址 0x400000 作为文本段的开头? [英] Why is address 0x400000 chosen as a start of text segment in x86_64 ABI?

查看:31
本文介绍了为什么在 x86_64 ABI 中选择地址 0x400000 作为文本段的开头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

this 文档中.27 它说文本段开始于0x400000.为什么选择这个特定地址?有没有原因是什么?在 Linux 上的 GNU ld 中选择了相同的地址:

In this document on p. 27 it says that text segment starts at 0x400000. Why was this particular address chosen? Is there any reason for that? The same address is chosen in GNU ld on Linux:

$ ld -verbose | grep -i text-segment
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

令人惊讶,因为这个地址在 32 位 x86 可执行文件中更大:

It's surprising because this address is bigger in 32-bit x86 executables:

$ ld -verbose | grep -i text-segment
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS;

我阅读了 这个问题 讨论了为什么选择了 0x080xxxxx 地址对于 i386,但它不能解释 x86_64 的变化.很难找到关于此事的任何解释.有人知道吗?

I read this question which discusses why 0x080xxxxx address was chosen for i386 but it doesn't explain a change in x86_64. It's hard to find any explanation on that matter. Does anybody have a clue?

推荐答案

底线:amd64 在使用大地址方面的一些技术限制建议专用于较低的 2GiB地址空间的代码和数据以提高效率.因此堆栈已被重新定位到此范围之外.

Bottom line: some technical limitations that amd64 has in using large addresses suggest dedicating the lower 2GiB of address space to code and data for efficiency. Thus the stack has been relocated out of this range.

i386 ABI1

  • stack 位于代码之前,从 0x8048000 下方开始增长.它提供了超过 128 MB用于堆栈,大约 2 GB 用于文本和数据"(第 3-22 页).
  • 动态段从 0x80000000 (2GiB) 开始,
  • 并且内核占用顶部的保留区域",规范允许最多 1GiB,至少从 0xC0000000 开始(第 3-21 页))(这是它通常所做的).
  • 主程序不需要与位置无关.
  • 不需要实现来捕获空指针访问(第 3-21 页),但可以合理地期望 128MiB(即 288KiB>) 将保留用于此目的.
  • stack is located before the code, growing from just under 0x8048000 downwards. Which provides "a little over 128 MB for the stack and about 2 GB for text and data" (p. 3-22).
  • Dynamic segments start at 0x80000000 (2GiB),
  • and the kernel occupies the "reserved area" at the top which the spec allows to be up to 1GiB, starting at at least 0xC0000000 (p. 3-21) (which is what it typically does).
  • The main program is not required to be position-independent.
  • An implementation is not required to catch null pointer access (p. 3-21) but it's reasonable to expect that some of the stack space above 128MiB (which is 288KiB) will be reserved for that purpose.

amd64(其 ABI 被公式化为对 i386 的修正(第 9 页)具有更大的(48 位)地址空间,但大多数指令仅接受 32 位立即数操作数(包括跳转指令中的直接地址和偏移量)),需要更多的工作和效率较低的代码(尤其是在考虑指令相互依赖性时)来处理更大的值.作者通过介绍一些他们推荐使用的代码模型"总结了解决这些限制的措施,以允许编译器生成更好的代码".(第 33 页)

amd64 (whose ABI is formulated as an amendment to the i386 one (p. 9)) has a vastly bigger (48-bit) address space but most instructions only accept 32-bit immediate operands (which include direct addresses and offsets in jump instructions), requiring more work and less efficient code (especially when taking instruction interdependency into consideration) to handle larger values. Measures to work around these limitations are summarized by the authors by introducing a few "code models" they recommend to use to "allow the compiler to generate better code". (p. 33)

  • 具体来说,其中的第一个小代码模型"建议使用地址在 0 到 231-224-1 的范围内或者从 0x000000000x7effffff" 这允许一些非常有效的相对引用和数组迭代.这是 1.98GiB,对于许多程序来说已经足够了.
  • 中等代码模型"是在前一个的基础上,将数据拆分为上述边界下的快"部分和需要特殊指令访问的慢"剩余部分.而代码仍然在边界之下.
  • 只有大"模型不对大小做任何假设,需要编译器使用movabs指令,就像在medium中一样代码模型,甚至用于处理文本部分内的地址.此外,当分支到以下地址时需要间接分支与当前指令指针的偏移量未知." 他们继续建议将代码库拆分为多个共享库,因为这些措施不适用于已知偏移量在边界内的相对引用(如小位置独立代码模型").
  • Specifically, the first of them, "Small code model", suggests using addresses "in the range from 0 to 231-224-1 or from 0x00000000 to 0x7effffff" which allows some very efficient relative references and array iteration. This is 1.98GiB which is more than enough for many programs.
  • "Medium code model" is based on the previous one, splitting the data into a "fast" part under the above boundary and the "slower" remaining part which requires a special instruction to access. While code remains under the boundary.
  • And only the "large" model makes no assumptions about sizes, requiring the compiler "to use the movabs instruction, as in the medium code model, even for dealing with addresses inside the text section. Additionally, indirect branches are needed when branching to addresses whose offset from the current instruction pointer is unknown." They go on to suggest splitting the code base into multiple shared libraries since these measures do not apply for relative references with offsets that are known to be within bounds (as outlined in "Small position independent code model").

因此堆栈被移动到共享库空间(0x80000000000128GiB)下,因为它的地址从不是直接操作数,总是间接引用或使用lea/mov 来自另一个参考,因此只有相对偏移限制适用.

Thus the stack was moved to under the shared library space (0x80000000000, 128GiB) because its addresses are never immediate operands, always referenced either indirectly or with lea/mov from another reference, thus only relative offset limitations apply.

上面解释了为什么加载地址被移到了较低的地址.现在,为什么它被移到了 0x400000 (4MiB)?在这里,我空了下来,总结一下我在 ABI 规范中读到的内容,我只能猜测它感觉恰到好处":

The above explains why the loading address was moved to a lower address. Now, why was it moved to exactly 0x400000 (4MiB)? Here, I came empty so, summarizing what I've read in the ABI specs, I can only guess that it felt "just right":

  • 它足够大,可以捕获任何可能不正确的结构偏移,允许 amd64 操作更大的数据单元,但又足够小,不会浪费很多有价值的起始 2GiB地址空间.
  • 它等于迄今为止最大的实际页面大小,是您能想到的所有其他虚拟内存单元大小的倍数.
  • It's large enough to catch any likely incorrect structure offset, allowing for larger data units that amd64 operates on, yet small enough to not waste much of the valuable starting 2GiB of address space.
  • It's equal to the largest practical page size to date and is a multiple of all other virtual memory unit sizes one can think of.

1请注意,实际的 x32 Linux 已偏离此布局 更多更多 随着时间的推移.但我们在这里讨论的是 ABI 规范,因为 amd64 正式基于它而不是任何派生的布局(请参阅其引文段落).

1Note that actual x32 Linuxes have been deviating from this layout more and more as time goes. But we're talking about the ABI spec here since the amd64 one is formally based on it rather than any derived layout (see its paragraph for citation).

这篇关于为什么在 x86_64 ABI 中选择地址 0x400000 作为文本段的开头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆