使用“push"或“sub"x86 指令时如何分配堆栈内存? [英] How is Stack memory allocated when using 'push' or 'sub' x86 instructions?

查看:29
本文介绍了使用“push"或“sub"x86 指令时如何分配堆栈内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经浏览了一段时间,我正在尝试了解在执行以下操作时如何将内存分配给堆栈:

I have been browsing for a while and I am trying to understand how memory is allocated to the stack when doing for example:

push rax

或者移动栈指针为子程序的局部变量分配空间:

Or moving the stack pointer to allocate space for local variables of a subroutine:

sub rsp, X    ;Move stack pointer down by X bytes 

我的理解是堆栈段在虚拟内存空间中是匿名的,即没有文件支持.

What I understand is that the stack segment is anonymous in the virtual memory space,i.e., not file backed.

我还了解到,内核实际上不会将匿名虚拟内存段映射到物理内存,直到程序实际对该内存段执行某些操作,即写入数据.因此,在写入之前尝试读取该段可能会导致错误.

What I also understand is that the kernel will not actually map an anonymous virtual memory segment to physical memory until the program actually does something with that memory segment,i.e, write data. So, trying to read that segment before writing to it may cause an error.

在第一个示例中,如果需要,内核将在物理内存中分配一个帧页面.在第二个示例中,我假设内核不会为堆栈段分配任何物理内存,直到程序实际将数据写入堆栈段中的地址.

In the first example the kernel will assign a frame page in physical memory if needed. In the second example I assume that the kernel will not assign any physical memory to the stack segment until the program actually writes data to an address in the stack stack segment.

我在正确的轨道上吗?

推荐答案

是的,您在这方面走在正确的轨道上,几乎.sub rsp, X 有点像懒惰"分配:内核仅在 #PF 页面错误异常发生后才执行任何操作,该异常是由于接触新 RSP 上方的内存而导致的,而不仅仅是修改寄存器.但是您仍然可以考虑分配"的内存,即可以安全使用.

yes, you're on the right track here, pretty much. sub rsp, X is kind of like "lazy" allocation: the kernel only does anything after a #PF page fault exception from touching memory above the new RSP, not just modifying registers. But you can still consider the memory "allocated", i.e. safe for use.

因此,在写入之前尝试读取该段可能会导致错误.

So, trying to read that segment before writing to it may cause an error.

不,读取不会导致错误.从未写过的匿名页面被写时复制映射到一个/物理零页面,无论它们是在 BSS、堆栈还是 mmap(MAP_ANONYMOUS) 中.

No, read won't cause an error. Anonymous pages that have never been written are copy-on-write mapped to a/the physical zero page, whether they're in the BSS, stack, or mmap(MAP_ANONYMOUS).

有趣的事实:在微基准测试中,确保为输入数组写入每一页内存,否则您实际上是在重复循环相同的物理 4k 或 2M 页的零,并且会获得 L1D 缓存命中,即使您仍然获得 TLB 未命中(和软页面错误)!gcc 会将 malloc+memset(0) 优化为 calloc,但是 std::vector 实际上会写入所有内存,无论您是否愿意.全局数组上的 memset 没有优化,所以可以.(或者非零初始化数组将在数据段中进行文件支持.)

Fun fact: in micro-benchmarks, make sure you write each page of memory for input arrays, otherwise you're actually looping over the same physical 4k or 2M page of zeros repeatedly and will get L1D cache hits even though you still get TLB misses (and soft page faults)! gcc will optimize malloc+memset(0) to calloc, but std::vector will actually write all the memory whether you want it to or not. memset on global arrays is not optimized out, so that works. (Or non-zero initialized arrays will be file-backed in the data segment.)

注意,我忽略了映射与有线之间的区别.即访问是否会触发软/次要页面错误以更新页表,或者是否只是 TLB 未命中并且硬件页表遍历会找到映射(到零页).

Note, I'm leaving out the difference between mapped vs. wired. i.e. whether an access will trigger a soft/minor page fault to update the page tables, or whether it's just a TLB miss and the hardware page-table walk will find a mapping (to the zero page).

但是 RSP 以下的堆栈内存可能根本没有映射,因此在不首先移动 RSP 的情况下接触它可能是无效页面错误而不是次要"页面错误.页面错误以解决写时复制.

But stack memory below RSP may not be mapped at all, so touching it without moving RSP first can be an invalid page fault instead of a "minor" page fault to sort out copy-on-write.

堆栈内存有一个有趣的转折:堆栈大小限制类似于 8MB (ulimit -s),但在 Linux 中,进程的第一个线程的初始堆栈是特殊的.例如,我在 hello-world(动态链接)可执行文件中的 _start 中设置了一个断点,并查看了 /proc/<PID>/smaps:

Stack memory has an interesting twist: The stack size limit is something like 8MB (ulimit -s), but in Linux the initial stack for the first thread of a process is special. For example, I set a breakpoint in _start in a hello-world (dynamically linked) executable, and looked at /proc/<PID>/smaps for it:

7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
Rss:                   8 kB
Pss:                   8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
...

只有 8kiB 的堆栈被引用并由物理页面支持.这是意料之中的,因为动态链接器不使用大量堆栈.

Only 8kiB of stack has been referenced and is backed by physical pages. That's expected, since the dynamic linker doesn't use a lot of stack.

甚至只有 132kiB 的堆栈映射到进程的虚拟地址空间. 但是特殊的魔法阻止 mmap(NULL, ...) 在 8MiB 内随机选择页面堆栈可以增长到的虚拟地址空间.

Only 132kiB of stack is even mapped into the process's virtual address space. But special magic stops mmap(NULL, ...) from randomly choosing pages within the 8MiB of virtual address space that the stack could grow into.

触及低于当前堆栈映射但在堆栈限制内的内存 导致内核增长堆栈映射(在页面错误处理程序中).

Touching memory below the current stack mapping but within the stack limit causes the kernel to grow the stack mapping (in the page-fault handler).

(但 仅当 rsp首先调整 仅比 rsp 低 128 个字节,所以 ulimit -s unlimited 不会使 rsp<下的 1GB 触及内存/code> 将堆栈增长到那里,但是如果您将 rsp 递减到那里然后触摸内存,它会.)

(But only if rsp is adjusted first; the red-zone is only 128 bytes below rsp, so ulimit -s unlimited doesn't make touching memory 1GB below rsp grow the stack to there, but it will if you decrement rsp to there and then touch memory.)

这仅适用于初始/主线程的堆栈.pthreads 只是使用 mmap(MAP_ANONYMOUS|MAP_STACK) 来映射一个无法增长的 8MiB 块.(MAP_STACK 目前是一个空操作.)所以线程堆栈在分配后不能增长(除非手动使用 MAP_FIXED 如果它们下面有空间),并且不受影响通过 ulimit -s 无限制.

This only applies to the initial/main thread's stack. pthreads just uses mmap(MAP_ANONYMOUS|MAP_STACK) to map an 8MiB chunk that can't grow. (MAP_STACK is currently a no-op.) So thread stacks can't grow after allocation (except manually with MAP_FIXED if there's space below them), and aren't affected by ulimit -s unlimited.

mmap(MAP_GROWSDOWN) 不存在这种阻止其他事物选择堆栈增长区域中地址的魔法,因此 不要用它来分配新的线程堆栈.(否则,您最终可能会占用新堆栈下方的虚拟地址空间,使其无法增长).只需分配完整的 8MiB.另请参阅 在哪里位于进程虚拟地址空间中的其他线程的堆栈?.

This magic preventing other things from choosing addresses in the stack-growth region doesn't exist for mmap(MAP_GROWSDOWN), so do not use it to allocate new thread stacks. (Otherwise you could end up with something using up the virtual address space below the new stack, leaving it unable to grow). Just allocate the full 8MiB. See also Where are the stacks for the other threads located in a process virtual address space?.

MAP_GROWSDOWN 确实有按需增长的功能,mmap(2) 手册页 中有描述,但没有增长限制(除了接近现有映射),所以(根据手册页)它基于 Windows 使用的保护页面,而不是主线程的堆栈.

MAP_GROWSDOWN does have a grow-on-demand feature, described in the mmap(2) man page, but there's no growth limit (other than coming close to an existing mapping), so (according to the man page) it's based on a guard-page like Windows uses, not like the primary thread's stack.

MAP_GROWSDOWN 区域底部接触内存多个页面可能会出现段错误(与 Linux 的主线程堆栈不同).面向 Linux 的编译器不会生成堆栈探针"确保在大分配(例如本地数组或 alloca)后按顺序访问每个 4k 页面,这是 MAP_GROWSDOWN 对堆栈不安全的另一个原因.

Touching memory multiple pages below the bottom of a MAP_GROWSDOWN region might segfault (unlike with Linux's primary-thread stack). Compilers targeting Linux don't generate stack "probes" to make sure each 4k page is touched in order after a big allocation (e.g. local array or alloca), so that's another reason MAP_GROWSDOWN isn't safe for stacks.

编译器确实会在 Windows 上发出堆栈探测.

Compilers do emit stack probes on Windows.

(MAP_GROWSDOWN 甚至可能根本不起作用,请参阅 @BeeOnRope 的评论.用于任何事情都不是很安全,因为如果映射越来越接近,堆栈冲突安全漏洞是可能的到别的东西.所以永远不要将 MAP_GROWSDOWN 用于任何事情.我将在提及中留下来描述 Windows 使用的保护页面机制,因为知道 Linux 的主线程堆栈很有趣设计不是唯一可能的.)

(MAP_GROWSDOWN might not even work at all, see @BeeOnRope's comment. It was never very safe to use for anything, because stack clash security vulnerabilities were possible if the mapping grows close to something else. So just don't use MAP_GROWSDOWN for anything ever. I'm leaving in the mention to describe the guard-page mechanism Windows uses, because it's interesting to know that Linux's primary-thread stack design isn't the only one possible.)

这篇关于使用“push"或“sub"x86 指令时如何分配堆栈内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆