为什么编译器将数据放在PE和ELF文件的.text(code)部分中,并且CPU如何区分数据和代码? [英] Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code?

查看:118
本文介绍了为什么编译器将数据放在PE和ELF文件的.text(code)部分中,并且CPU如何区分数据和代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在引用本文:



二进制搅拌:
的自随机指令地址传统x86二进制代码





但我有一些疑问:


  1. 这如何加快程序速度?!我只能想象这只会使cpu的执行变得更加复杂?


  2. ,CPU如何区分代码和数据?因为据我所知,除非存在跳转类型的指令,否则cpu将以线性方式依次执行每个指令,那么cpu怎么知道代码中的哪些指令是代码,哪些指令是数据?


  3. 考虑到代码部分是可执行的,CPU可能会错误地将恶意数据作为代码执行,这对安全性不是很不利吗? (也许攻击者将程序重定向到该指令?)



解决方案

是的提议的二进制随机化器需要处理这种情况,因为可能存在混淆的二进制文件,或者手写代码可能会由于作者不了解或出于某些奇怪原因而做任意事情。


但没有,普通的编译器不要针对x86执行此操作。这个答案解决的是书面形式的SO问题,而不是包含这些要求的论文:


现代编译器积极地将静态数据插入PE和ELF二进制代码的代码段中,以实现性能原因


需要引用! 在我使用GCC和clang等编译器的经验中,这对于x86来说是完全错误的,并且在从MSVC和ICC中查看asm输出方面有些经验。


普通编译器将静态只读数据放入 .rodata 部分(ELF平台)或 .rdata 部分(Windows)。 .rodata 部分(和 .text 部分)链接为文本 segment 的一部分,但整个可执行文件或库的所有只读数据都组合在一起,而所有代码则分别组合在一起。 ELF文件中的节和段有什么区别格式(或者最近,甚至在单独的ELF段中,因此 .rodata 可以映射为noexec。)




英特尔的优化指南说不要混用代码/数据,尤其是读写数据


汇编/编译器编码规则50。(M影响,L通用性)如果(希望是只读的)数据必须
与代码出现在同一页面上,请避免将其直接放在间接后面跳。例如,
跟随其最有可能成为目标的间接跳转,并将数据放置在无条件分支之后。




汇编/编译器编码规则51。(对H的影响,对L的通用性) 始终将代码和数据放在
个单独的页面上
。尽可能避免自我修改代码。如果要修改代码,请尝试一次在
处执行所有操作,并确保执行修改的代码和要修改的代码在
单独的4 KB页面上或单独的对齐的1 KB上


(有趣的事实:Skylake实际上具有用于自修改代码管道核的缓存行粒度;在最近的高端uarch上是安全的将读/写数据放在64个字节的代码内。)




在同一页面中混合代码和数据在x86上的优势几乎为零,并且浪费数据-TLB在代码字节上的覆盖范围,浪费了指令-TLB在数据字节上的覆盖范围。在64字节高速缓存行中也是如此,以浪费L1i / L1d中的空间。唯一的优势是统一缓存(L2和L3)的代码+数据局部性,但这通常不是 。 (例如,在代码获取将一行插入L2之后,从同一行获取数据可能会进入L2,而不得不从另一缓存行访问RAM来获取数据。)


拆分L1iTLB和L1dTLB,以及将L2 TLB用作统一的受​​害者缓存(也许我认为是 ),<还没有为此优化 strong> x86 CPU。在获取冷 CPU时出现iTLB丢失。在现代Intel CPU上从同一缓存行读取字节时,该功能不能防止dTLB丢失。


x86上的代码大小优势为零。 x86-64的PC相对寻址模式为 [RIP + rel32] ,因此它可以寻址当前位置+ -2GiB之内的任何内容。 32位x86甚至没有PC相对寻址模式。


也许作者正在考虑ARM,其中附近的静态数据允许PC相对加载( (在ARM上称为文字池,您会在函数之间找到它们。)


我认为它们不是指立即数据,例如 mov eax,12345 ,其中是32位 12345 是指令编码的一部分。这不是要通过加载指令加载的静态数据;


显然,它仅用于只读数据;在指令指针附近进行写操作将触发清除管道以处理自修改代码的可能性。而且,您通常希望W ^ X(写或exec,不是全部)用于内存页。


,CPU如何区分代码和数据?


递增。 CPU通过RIP提取字节,并将其解码为指令。从程序入口点开始后,执行将在已执行的分支之后执行,并落入未执行的分支等。


存档,它不关心当前字节以外的字节正在执行或正在由指令加载/存储为数据的数据。如果再次需要,最近执行的字节将保留在L1-I缓存中,对于L1-D缓存中的数据也是如此。


具有数据而不是数据在无条件分支或 ret 之后的其他代码并不重要。函数之间的填充可以是任何东西。在极少数情况下,如果数据具有某种模式,则数据可能会暂停预解码或解码阶段(例如,由于现代CPU以16或32字节的宽块为单位进行取/解码),但随后的CPU阶段都是仅查看来自正确路径的实际解码指令。 (或者由于分支的错误推测...)


因此,如果执行到达一个字节,则该字节是指令的一部分。这对CPU完全没问题,但对于想要浏览可执行文件并将每个字节分类为或的程序无济于事。


代码提取总是检查TLB中的权限,因此,如果RIP指向不可执行的页面,则会出错。 (页表条目中的NX位)。


但是实际上就CPU而言,没有真正的区别。 x86是冯·诺依曼架构。


例如,一条指令可以加载自己的代码字节。 movzx eax,字节ptr [rip-1] 将EAX设置为0x000000FF,并加载rel32 = -1 = 0xffffffff位移的最后一个字节。





考虑到代码部分是可执行的,CPU可能会错误地将恶意数据作为代码执行,这对安全性不是很不利吗? (也许攻击者将程序重定向到该指令?)


可执行页面中的只读数据可用作Spectre小工具或返回的小工具面向程序设计(ROP)攻击。但是,我认为通常在真实代码中已经有足够多的小工具了,所以我认为这没什么大不了的。


但是是的,与您的其他观点不同,这实际上是有效的。


最近(2019年或2018年末),GNU Binutils ld 已开始放置 .rodata 部分与 .text 部分分开,因此它可以是只读的,没有执行权限。这使得静态只读数据在x86-64之类的ISA上不可执行,而exec权限与读取权限是分开的。即在单独的ELF段中。


可以使更多内容变为不可执行的越好,并且混合使用代码和常量将要求它们可执行。


So i am referencing this paper :

Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code

https://www.utdallas.edu/~hamlen/wartell12ccs.pdf

Code interleaved with data: Modern compilers aggressively interleave static data within code sections in both PE and ELF binaries for performance reasons. In the compiled binaries there is generally no means of distinguishing the data bytes from the code. Inadvertently randomizing the data along with the code breaks the binary, introducing difficulties for instruction-level randomizers. Viable solutions must somehow preserve the data whilst randomizing all the reachable code.

but i have some questions :

  1. how does this speed up the program?! i can only imagine this will only make the cpu execution more complex?

  2. and how does the CPU can distinguish between code and data? because as far as i remember cpu will execute each instruction one after the other in a linear way unless there is a jump type of instruction, so how can the cpu know which instructions inside code are code and which ones are data?

  3. isnt this VERY bad for security considering that the code section is executable and CPU might by mistake execute a malicious data as code? (maybe attacker redirecting the program to that instruction? )

解决方案

Yes their proposed binary randomizer needs to handle this case because obfuscated binaries can exist, or hand-written code might do arbitrary things because the author didn't know better or for some weird reason.

But no, normal compilers don't do this for x86. This answer addresses the SO question as written, not the paper containing those claims:

Modern compilers aggressively interleave static data within code sections in both PE and ELF binaries for performance reasons

Citation needed! This is just plain false for x86 in my experience with compilers like GCC and clang, and some experience looking at asm output from MSVC and ICC.

Normal compilers put static read-only data into section .rodata (ELF platforms), or section .rdata (Windows). The .rodata section (and the .text section) are linked as part of the text segment, but all the read-only data for the whole executable or library is grouped together, and all the code is separately grouped together. What's the difference of section and segment in ELF file format (Or more recently, even in a separate ELF segment so .rodata can be mapped noexec.)


Intel's optimization guide says not to mix code/data, especially read+write data:

Assembly/Compiler Coding Rule 50. (M impact, L generality) If (hopefully read-only) data must occur on the same page as code, avoid placing it immediately after an indirect jump. For example, follow an indirect jump with its mostly likely target, and place the data after an unconditional branch.

Assembly/Compiler Coding Rule 51. (H impact, L generality) Always put code and data on separate pages. Avoid self-modifying code wherever possible. If code is to be modified, try to do it all at once and make sure the code that performs the modifications and the code being modified are on separate 4-KByte pages or on separate aligned 1-KByte subpages.

(Fun fact: Skylake actually has cache-line granularity for self-modifying-code pipeline nukes; it's safe on that recent high-end uarch to put read/write data within 64 bytes of code.)


Mixing code and data in the same page has near-zero advantage on x86, and wastes data-TLB coverage on code bytes, and wastes instruction-TLB coverage on data bytes. And same within 64-byte cache lines for wasting space in L1i / L1d. The only advantage is code+data locality for unified caches (L2 and L3), but that's not typically done. (e.g. after code-fetch brings a line into L2, fetching data from the same line could hit in L2 vs. having to go to RAM for data from another cache line.)

But with split L1iTLB and L1dTLBs, and the L2 TLB as a unified victim cache (maybe I think?), x86 CPUs are not optimized for this. An iTLB miss while fetching a "cold" function doesn't prevent a dTLB miss when reading bytes from the same cache line on modern Intel CPUs.

There is zero advantage for code-size on x86. x86-64's PC-relative addressing mode is [RIP + rel32], so it can address anything within +-2GiB of the current location. 32-bit x86 doesn't even have a PC-relative addressing mode.

Perhaps the author is thinking of ARM, where nearby static data allows PC-relative loads (with a small offset) to get 32-bit constants into registers? (This is called a "literal pool" on ARM, and you'll find them between functions.)

I assume they don't mean immediate data, like mov eax, 12345, where a 32-bit 12345 is part of the instruction encoding. That's not static data to be loaded with a load instruction; immediate data is a separate thing.

And obviously it's only for read-only data; writing near the instruction pointer will trigger a pipeline clear to handle the possibility of self-modifying code. And you generally want W^X (write or exec, not both) for your memory pages.

and how does the CPU can distinguish between code and data?

Incrementally. The CPU fetches bytes at RIP, and decodes them as instructions. After starting at the program entry point, execution proceeds following taken branches, and falling through not-taken branches, etc.

Architecturally, it doesn't care about bytes other than the ones it's currently executing, or that are being loaded/stored as data by an instruction. Recently-executed bytes will stick around in the L1-I cache, in case they're needed again, and same for data in L1-D cache.

Having data instead of other code right after an unconditional branch or a ret is not important. Padding between functions can be anything. There might be rare corner cases where data could stall pre-decode or decode stages if it has a certain pattern (because modern CPUs fetch/decode in wide blocks of 16 or 32 bytes, for example), but any later stages of the CPU are only looking at actual decoded instructions from the correct path. (Or from mis-speculation of a branch...)

So if execution reaches a byte, that byte is (part of) an instruction. This is totally fine for the CPU, but unhelpful for a program that wants to look through an executable and classify each byte as either/or.

Code-fetch always checks permissions in the TLB, so it will fault if RIP points into a non-executable page. (NX bit in the page table entry).

But really as far as the CPU is concerned, there is no true distinction. x86 is a von Neumann architecture. An instruction can load its own code bytes if it wants.

e.g. movzx eax, byte ptr [rip - 1] sets EAX to 0x000000FF, loading the last byte of the rel32 = -1 = 0xffffffff displacement.


isnt this VERY bad for security considering that the code section is executable and CPU might by mistake execute a malicious data as code? (maybe attacker redirecting the program to that instruction? )

Read-only data in executable pages can be used as a Spectre gadget, or a gadget for return-oriented-programming (ROP) attacks. But usually there's already enough such gadgets in real code that it's not a big deal, I think.

But yes, that's a minor objection to this which is actually valid, unlike your other points.

Recently (2019 or late 2018), GNU Binutils ld has started putting the .rodata section in a separate page from the .text section so it can be read-only without exec permission. This makes static read-only data non-executable, on ISAs like x86-64 where exec permission is separate from read permission. i.e. in a separate ELF segment.

The more things you can make non-executable the better, and mixing code+constants would require them to be executable.

这篇关于为什么编译器将数据放在PE和ELF文件的.text(code)部分中,并且CPU如何区分数据和代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆