哪些段受写时复制影响? [英] Which segments are affected by a copy-on-write?

查看:81
本文介绍了哪些段受写时复制影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对写时复制的理解是:每个人都有相同数据的单个共享副本,直到被写入,然后再进行复制".

  1. 是由堆和bss段组成还是由堆组成的相同数据的共享副本?
  2. 将共享哪些内存段,这取决于操作系统吗?

解决方案

操作系统可以设置其希望的任何写时复制"策略,但是通常它们都做相同的事情(即最有意义的事情). /p>

松散地,对于类似POSIX的系统(Linux,BSD,OSX),有四个感兴趣的领域(您所说的段):data(int x = 1;所在的位置),bss(其中int y转到),sbrk(这是堆/malloc)和stack

完成fork后,操作系统会为子级设置共享父级所有页面的新页面映射.然后,在父和子的页面映射中,所有页面都标记为只读.

每个页面映射还具有一个引用计数,该引用计数指示有多少个进程正在共享该页面.在派生之前,引用计数为1,之后为2.

现在,当任一进程尝试写入R/O页面时,将出现页面错误.操作系统将看到这是用于写入时复制"的操作,将为该进程创建一个私有页面,从共享中复制数据,将该页面标记为对该进程可写并继续.

它也会降低引用计数.如果refcount现在[再次]为1,则操作系统会将 other 进程中的页面标记为可写且不可共享[这消除了另一个进程中的第二个页面错误-加速仅是因为在这一点上,操作系统知道其他进程应该可以自由地再次写成毫不费力的内容.这种提速可能取决于操作系统.

实际上,bss部分甚至得到了 more 特殊待遇.在它的初始页面映射中,所有页面都映射到包含所有零的单个页面(也称为零页面").该映射标记为R/O.因此,bss区域的大小可能为千兆字节,并且仅会占用一个物理页面.这个单一的特殊零页面在 all 进程的 all bss部分之间共享,无论它们在彼此之间是否具有 any 关系全部.

因此,一个进程可以从该区域中的任何页面读取并获得期望的值:零.只有当进程尝试写入这样的页面时,才会启动相同的写入机制副本,该进程将获得一个私有页面,调整映射,然后恢复该进程.现在可以自由地按需写页面.

再次,操作系统可以选择其策略.例如,在派生之后,共享个堆栈页面可能会更有效,但是从当前"页面的私有副本开始,这取决于堆栈指针寄存器的值

当[c10]系统调用[在子节点上]完成时,内核必须撤消在fork [降低引用计数]期间完成的许多映射,释放子节点的映射,等等,并还原父节点的原始页面.保护(即除非再次执行fork,否则它将不再共享其页面)


尽管不是您最初问题的一部分,但可能会涉及一些相关的活动,例如按需加载(页面)和按需链接 [of符号]之后exec进行系统调用.

当进程执行exec时,内核进行上述清理,并读取可执行文件的一小部分以确定其对象格式.主要格式是ELF,但是可以使用内核可以理解的任何格式(例如OSX可以使用ELF [IIRC],但也可以使用其他格式).

对于ELF,可执行文件具有一个特殊的部分,该部分提供指向所谓的"ELF解释器"的完整FS路径,该路径是一个共享库,通常为/lib64/ld.linux.so.

内核使用内部格式mmap将其映射到应用程序空间,并为可执行文件本身设置映射.大多数东西都标记为R/O页面不存在".

在继续之前,我们需要讨论页面的后备存储".就是说,如果发生页面错误,我们需要从磁盘所在的页面加载页面.对于堆/malloc,通常是交换磁盘(即分页磁盘).

在linux下,通常是安装系统时添加的"linux swap"类型的分区.当写入页面时,必须将其刷新到磁盘上以释放一些物理内存,然后将其写入那里.请注意,第一部分中的页面共享算法仍然适用.

无论如何,当首先将可执行文件首先映射到内存中时,其后备存储是文件系统中的可执行文件.

因此,内核将应用程序的程序计数器设置为指向ELF解释器的起始位置,并将控制权转移给它.

ELF解释器开展业务.每次它尝试执行被映射但未加载 的一部分[代码页]时,都会发生页面错误,并从后台加载该页面存储(例如,ELF解释器的文件),并将映射更改为R/O但存在.

这种情况发生在ELF解释器,共享库和可执行文件本身上.

现在,ELF解释器将使用mmaplibc映射到应用程序空间(同样,受需求负载的影响).如果ELF解释程序必须修改代码页以重新定位符号[或尝试写入具有文件作为后备存储的任何文件,例如data页],则会出现保护错误,内核将为以下情况更改后备存储:从磁盘文件上的页面到交换磁盘上的页面,调整保护并恢复应用.

内核还必须处理ELF解释器(例如)试图写入[say]尚未加载的data页的情况(即,它必须先加载它,然后更改后备存储)到交换磁盘)

然后,ELF解释器使用libc的各个部分来帮助其完成初始链接活动.它重新定位了使其工作所需的最低限度.

但是,ELF解释器不会在大多数其他共享库的所有符号附近重新定位.它会 查看可执行文件,并再次使用mmap为可执行文件所需的共享库(即执行ldd executable时看到的内容)创建一个 mapping .

这些到共享库和可执行文件的映射可以被认为是段".

在每个共享库中都有一个符号跳转表,指向您的解释器.但是,ELF解释器所做的更改很少.

[注意:这是一个宽松的解释]仅当应用程序尝试调用给定函数的跳转条目时[这就是GOT等. al.您可能已经看到的东西]是否发生重定位.跳转条目将控制权转移给解释器,该解释器定位该符号的 real 地址并调整GOT,以便它现在直接指向该符号的最终地址并重做该调用,该调用现在将被调用真正的功能.在随后调用相同的给定函数时,它现在可以直接执行.

这称为按需链接".

所有mmap活动的副产品都是经典的sbrk系统调用,几乎没有用处.它将很快与共享库内存映射之一发生冲突.

因此,现代的libc不使用它.当malloc从操作系统需要更多内存时,它将从匿名mmap请求更多内存,并跟踪哪些分配属于哪个mmap映射. (即,如果释放了足够的内存来构成整个映射,则free可以执行munmap).

因此,总而言之,我们同时进行写时复制",按需加载"和按需链接".看起来很复杂,但是可以使forkexec快速,流畅地运行.这增加了一些复杂性,但是额外的开销仅在需要时(按需")完成.

因此,不是在程序启动时出现较大的停顿/延迟,而是根据需要在系统的整个生命周期内分散了开销活动.

My understanding of copy-on-write is that "Everyone has a single, shared copy of the same data until it's written, and then a copy is made".

  1. Is a shared copy of the same data comprised of a heap and bss segment or only heap?
  2. Which memory segments will be shared, and is this dependent on the OS?

解决方案

The OS can set whatever "copy on write" policy it wishes, but generally, they all do the same thing (i.e. what makes the most sense).

Loosely, for a POSIX-like system (linux, BSD, OSX), there are four areas (what you were calling segments) of interest: data (where int x = 1; goes), bss (where int y goes), sbrk (this is heap/malloc), and stack

When a fork is done, the OS sets up a new page map for the child that shares all the pages of the parent. Then, in the page maps of the parent and the child, all the pages are marked readonly.

Each page map also has a reference count that indicates how many processes are sharing the page. Before the fork, the refcount will be 1 and, after, it will be 2.

Now, when either process tries to write to a R/O page, it will get a page fault. The OS will see that this is for "copy on write", will create a private page for the process, copy in the data from the shared, mark the page as writable for that process and resume it.

It will also bump down the refcount. If the refcount is now [again] 1, the OS will mark the page in the other process as writable and non-shared [this eliminates a second page fault in the other process--a speedup only because at this point the OS knows that the other process should be free to write unmolested again]. This speedup could be OS dependent.

Actually, the bss section get even more special treatment. In the initial page mapping for it, all pages are mapped to a single page that contains all zeroes (aka the "zero page"). The mapping is marked R/O. So, the bss area could be gigabytes in size and it will only occupy a single physical page. This single, special, zero page is shared amongst all bss sections of all processes, regardless whether they have any relationship to one another at all.

Thus, a process can read from any page in the area and gets what it expects: zero. It's only when the process tries to write to such a page, the same copy on write mechanism kicks in, the process gets a private page, the mapping is adjusted, and the process is resumed. It is now free to write to the page as it sees fit.

Once again, an OS can choose its policy. For example, after the fork, it might be more efficient to share most of the stack pages, but start off with private copies of the "current" page, as determined by the value of the stack pointer register.

When an exec syscall is done [on the child], the kernel has to undo much of the mapping done during the fork [bumping down refcounts], releasing the child's mapping, etc and restoring the parent's original page protections (i.e. it will no longer be sharing its pages unless it does another fork)


Although not part of your original question, there are related activities that may be of interest, such as on demand loading [of pages] and on demand linking [of symbols] after an exec syscall.

When a process does an exec, the kernel does the cleanup above, and reads a small portion of the executable file to determine its object format. The dominate format is ELF, but any format that a kernel understands can be used (e.g. OSX can use ELF [IIRC], but it also has others].

For ELF, the executable has a special section that gives a full FS path to what's known as the "ELF interpreter", which is a shared library, and is usually /lib64/ld.linux.so.

The kernel, using an internal form of mmap, will map this into the application space, and set up a mapping for the executable file itself. Most things are marked as R/O pages and "not present".

Before we go further, we need to talk about the "backing store" for a page. That is, if a page fault occurs and we need to load the page from disk, where it comes from. For heap/malloc, this is generally the swap disk [aka paging disk].

Under linux, it's generally the partition that is of the type "linux swap" that was added when the system was installed. When a page is written to that has to flushed to disk to free up some physical memory, it gets written there. Note that the page sharing algorithm in the first section still applies.

Anyway, when an executable is first mapped into memory, its backing store is the executable file in the filesystem.

So, the kernel sets the app's program counter to point to the starting location of the ELF interpreter, and transfers control to it.

The ELF interpreter goes about its business. Every time it tries to execute a portion of itself [a "code" page] that is mapped but not loaded, a page fault occurs and the loads that page from the backing store (e.g. the ELF interpreter's file) and changes the mapping to R/O but present.

This occurs for the ELF interpreter, shared libraries, and the executable itself.

The ELF interpreter will now use mmap to map libc into the app space [again, subject to the demand loading]. If the ELF interpreter has to modify a code page to relocate a symbol [or tries to write to any that has the file as the backing store, like a data page], a protection fault occurs, the kernel changes the backing store for the page from the on disk file to a page on the swap disk, adjusts the protections, and resumes the app.

The kernel must also handle the case where the ELF interpreter (e.g.) is trying to write to [say] a data page that had never yet been loaded (i.e. it has to load it first and then change the backing store to the swap disk)

The ELF interpreter then uses portions of libc to help it complete initial linking activities. It relocates the minimum necessary to allow it to do its job.

However, the ELF interpreter does not relocate anywhere near all the symbols for most other shared libraries. It will look through the executable and, again using mmap, create a mapping for the shared libraries the executable needs (i.e. what you see when you do ldd executable).

These mappings to shared libraries and executables, can be thought of as "segments".

There is a symbol jump table that points back to the interpreter in each shared library. But, the ELF interpreter makes minimal changes.

[Note: this is a loose explanation] Only when the application tries to call a given function's jump entry [this is that GOT et. al. stuff you may have seen] does a relocation occur. The jump entry transfers control to the interpreter, which locates the real address of the symbol and adjusts the GOT so that it now points directly to the final address for the symbol and redoes the call, which will now call the real function. On a subsequent call to the same given function, it now goes direct.

This is called "on demand linking".

A by-product of all this mmap activity is the the classical sbrk syscall is of little to no use. It would soon collide with one of the shared library memory mappings.

So, modern libc doesn't use it. When malloc needs more memory from the OS, it requests more memory from an anonymous mmap and keeps track of which allocations belong to which mmap mapping. (i.e. if enough memory got freed to comprise an entire mapping, free could do an munmap).

So, to sum up, we have "copy on write", "on demand loading", and "on demand linking" all going on at the same time. It seems complex, but makes fork and exec go quickly, smoothly. This adds some complexity, but extra overhead is done only when needed ("on demand").

Thus, instead of a large lurch/delay at the beginning launch of a program, the overhead activity gets spread out over the lifetime of the program, as needed.

这篇关于哪些段受写时复制影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆