可执行目标文件和虚拟内存 [英] Executable Object Files and Virtual Memory

查看:130
本文介绍了可执行目标文件和虚拟内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Linux和虚拟内存的初学者,但仍在努力理解虚拟内存和可执行对象文件之间的关系.

假设我们在硬盘驱动器磁盘上存储了一个可执行对象文件 a.out ,并且假设a.out最初具有一个 .data 部分,其中值为2018的变量. 加载程序运行时,会分配连续的虚拟页块,将其标记为无效(即未缓存),并将其页表条目指向a.out中的适当位置.加载程序实际上从未将任何数据从磁盘复制到内存中.第一次引用每页时,虚拟内存系统会根据需要自动分页数据.

我的问题是:假设程序在运行时将全局变量的值从2018更改为2019,看来包含该全局变量的虚拟页面最终将分页到磁盘,这意味着 .data 部分的全局变量现在为2019,因此我们更改了不应该更改的可执行对象文件吗?否则,每次完成并再次运行程序时,我们都会获得不同的值?

解决方案

一般而言(不是专门针对Linux)...

启动可执行文件后,操作系统(内核)将创建虚拟地址空间和(最初为空)进程,并检查可执行文件的标头.可执行文件的标题描述节"(例如.text.rodata.data.bss等),其中每个节具有不同的属性-如果该节的内容应该放在虚拟地址空间中,还是否(例如,符号表或在运行时未使用的东西),如果内容是否属于文件的一部分(例如,.bss),并且该区域应是可执行的,只读的或只读的/write.

通常,可执行文件(的使用部分)由虚拟文件系统缓存;并且可以将VFS缓存中已存在的文件片段映射(作为写入时复制")到新进程的虚拟地址空间中.对于尚未在VFS缓存中的部分,可以将文件的那些部分映射为需要获取"到新进程的虚拟地址空间中.

然后进程开始(给定CPU时间).

如果进程从尚未加载的页面读取数据; OS(内核)暂停该过程,将页面从磁盘上的文件中提取到VFS缓存中,然后还将页面作为写时复制"映射到该过程中;然后允许该过程继续进行(允许该过程重试从未加载的页面上的读取操作,因为该页面已加载,现在可以使用.)

如果进程写入的页面仍为写入时复制"; OS(内核)暂停进程,分配一个新页面并将原始页面的数据复制到其中,然后用该进程自己的副本替换原始页面;然后允许该进程继续执行(允许该进程重试该写操作,因为该进程具有其自己的副本,该写操作现在可以使用).

如果进程从尚未加载的页面写入数据; OS(内核)结合了之前的所有内容(将磁盘中的原始页面提取到VFS缓存中,创建副本,将进程的副本映射到进程的虚拟地址空间中).

如果操作系统开始耗尽可用内存;然后:

    可以在VFS中释放VFS缓存中但未与任何进程共享为写时复制"的文件数据的
  • 页,而无需执行其他任何操作.下次使用该文件时,这些页面将从磁盘上的文件中提取到VFS缓存中.

  • 可以在VFS中释放在VFS缓存中并且也与任何进程共享为写时复制"的文件数据的
  • 页,并且可以在任何/所有进程中将这些副本的副本标记为尚未提取" .下次使用文件时(包括当进程访问尚未提取"的页面时),这些页面将从磁盘上的文件中提取到VFS缓存中,然后在进程中映射为写入时复制" ).

  • 页已修改的数据(要么是因为它们最初是写时复制"但已被复制,要么是因为它们根本不属于可执行文件的一部分-例如,.bss部分,是可执行文件的堆空间等)可以保存为交换空间,然后释放.当进程再次访问页面时,将从交换空间中获取它们.

注意:如果可执行文件存储在不可靠的介质(例如,可能被刮擦的CD)上,则比平均水平更聪明"的操作系统可能会首先将整个可执行文件加载到VFS缓存和/或交换空间中;因为在进程使用文件时,除了使进程崩溃(例如,SIGSEGV)并使其看起来好像可执行文件在没有时,它没有健全的方法来处理从内存映射文件中读取错误",并且因为这样可以提高可靠性(因为您依赖的是更可靠的交换,而不是依赖于可靠性不高的暂存CD).还;如果OS防止文件损坏或恶意软件(例如,在可执行文件中内置了CRC或数字签名),则OS可以(应该)将所有内容加载到内存(VFS缓存)中以检查CRC或数字签名,然后再允许可执行文件被执行.执行,以及(对于安全系统,如果可执行文件在运行时修改了磁盘上的文件)释放RAM时,可能会将未修改的页面存储在更受信任"的交换空间中(与修改页面时相同),以避免从原始的不受信任"文件中获取数据(部分原因是您不想每次从文件加载页面时都进行整个数字签名检查).

我的问题是:假设程序在运行时将全局变量的值从2018更改为2019,看来包含全局变量的虚拟页最终将分页到磁盘,这意味着.data节全局变量现在是2019,所以我们更改了不应该更改的可执行目标文件?

包含2018的页面将以未获取"开始,然后(当其被访问时)加载到VFS缓存中,并映射为写时复制"到进程中.在这两种情况下,操作系统都可以释放内存,并在再次需要时从磁盘上的可执行文件中获取数据(尚未更改).

当进程修改全局变量(将其更改为包含2019)时,操作系统将为该进程创建它的副本.此后,如果操作系统要释放内存,则操作系统需要将页面数据保存在交换空间中,并在再次访问该页面数据时将其从交换空间中加载回去.可执行文件没有被修改,并且(对于该页面,对于该进程)该可执行文件不再使用.

I'm a beginner in Linux and Virtual Memory, still struggling in understanding the relationship between Virtual Memory and Executable Object Files.

let's say we have a executable object file a.out stored on hard drive disk, and lets say originally the a.out has a .data section with a global variable with a value of 2018. When the loader run, it allocates a contiguous chunk of virtual pages marks them as invalid (i.e., not cached) and points their page table entries to the appropriate locations in the a.out. The loader never actually copies any data from disk into memory. The data is paged in automatically and on demand by the virtual memory system the first time each page is referenced.

My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed? otherwise we get a different value each time we finish and run the program again?

解决方案

In general (not specifically for Linux)...

When an executable file is started, the OS (kernel) creates a virtual address space and an (initially empty) process, and examines the executable file's header. The executable file's header describes "sections" (e.g. .text, .rodata, .data, .bss, etc) where each section has different attributes - if the contents of the section should be put in the virtual address space or not (e.g. is a symbol table or something that isn't used at run-time), if the contents are part of the file or not (e.g. .bss), and if the area should be executable, read-only or read/write.

Typically, (used parts of) the executable file are cached by the virtual file system; and pieces of the file that are already in the VFS cache can be mapped (as "copy on write") into the new process' virtual address space. For parts that aren't already in the VFS cache, those pieces of the file can be mapped as "need fetching" into the new process' virtual address space.

Then the process is started (given CPU time).

If the process reads data from a page that hasn't been loaded yet; the OS (kernel) pauses the process, fetches the page from the file on disk into the VFS cache, then also maps the page as "copy on write" into the process; then allows the process to continue (allows the process to retry the read from the page that wasn't loaded, which will work now that the page is loaded).

If the process writes to a page that is still "copy on write"; the OS (kernel) pauses the process, allocates a new page and copies the original page's data into it, then replaces the original page with the process' own copy; then allows the process to continue (allows the process to retry the write which will work now that the process has it's own copy).

If the process writes to data from a page that hasn't been loaded yet; the OS (kernel) combines both of the previous things (fetches original page from disk into VFS cache, creates a copy, maps the process' copy into the process' virtual address space).

If the OS starts to run out of free RAM; then:

  • pages of file data that are in the VFS cache but aren't shared as "copy on write" with any process can be freed in the VFS without doing anything else. Next time the file is used those pages will be fetched from the file on disk into the VFS cache.

  • pages of file data that are in the VFS cache and are also shared as "copy on write" with any process can be freed in the VFS and the copies in any/all processes marked as "not fetched yet". Next time the file is used (including when a process accesses the "not fetched yet" page/s) those pages will be fetched from the file on disk into the VFS cache and then mapped as "copy on write" in the process/es).

  • pages of data that have been modified (either because they were originally "copy on write" but got copied, or because they weren't part of the executable file at all - e.g. .bss section, the executable's heap space, etc) can be saved to swap space and then freed. When the process accesses the page/s again they will be fetched from swap space.

Note: If the executable file is stored on unreliable media (e.g. potentially scratched CD) a "smarter than average" OS may load the entire executable file into VFS cache and/or swap space initially; because there's no sane way to handle "read error from memory mapped file" while the process is using the file other than making the process crash (e.g. SIGSEGV) and making it look like the executable was buggy when it was not, and because this improves reliability (because you're depending on more reliable swap and not depending on a less reliable scratched CD). Also; if the OS guards against file corruption or malware (e.g. has a CRC or digital signature built into executable files) then the OS may (should) load everything into memory (VFS cache) to check the CRC or digital signature before allowing the executable to be executed, and (for secure systems, in case the file on disk is modified while the executable is running) when freeing RAM may stored unmodified pages in "more trusted" swap space (the same as it would if the page was modified) to avoid fetching the data from the original "less trusted" file (partly because you don't want to do the whole digital signature check every time a page is loaded from the file).

My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed?

The page containing 2018 will begin as "not fetched", then (when its accessed) loaded into VFS cache and mapped into the process as "copy on write". At either of these points the OS may free the memory and fetch the data (that hasn't been changed) from the executable file on disk if it's needed again.

When the process modifies the global variable (changes it to contain 2019) the OS creates a copy of it for the process. After this point, if the OS wants to free the memory the OS needs to save the page's data in swap space, and load the page's data back from swap space if it's accessed again. The executable file is not modified and (for that page, for that process) the executable file isn't used again.

这篇关于可执行目标文件和虚拟内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆