关于Linux中程序的内存布局 [英] About the memory layout of programs in Linux

查看:79
本文介绍了关于Linux中程序的内存布局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Linux中程序的内存布局有一些疑问.我从各种来源(我正在阅读从头开始编程")中知道,每个部分都被加载到它自己的内存区域中.文本部分首先在虚拟地址0x8048000处加载,数据部分在此后立即加载,其次是bss部分,然后是堆和堆栈.

I have some questions about the memory layout of a program in Linux. I know from various sources (I'm reading "Programming from the Ground Up") that each section is loaded into it's own region of memory. The text section loads first at virtual address 0x8048000, the data section is loaded immediately after that, next is the bss section, followed by the heap and the stack.

为了试验布局,我在汇编中制作了此程序.首先,它打印一些标签的地址并计算系统断点.然后进入无限循环.循环会递增一个指针,然后尝试访问该地址的内存,在某个时刻,分段错误将退出程序(我是故意这样做的).

To experiment with the layout I made this program in assembly. First it prints the addresses of some labels and calculates the system break point. Then it enters into an infinite loop. The loop increments a pointer and then it tries to access the memory at that address, at some point a segmentation fault will exit the program (I did this intentionally).

这是程序:

.section .data

start_data:
str_mem_access:
.ascii "Accessing address: 0x%x\n\0"
str_data_start:
.ascii "Data section start at: 0x%x\n\0"
str_data_end:
.ascii "Data section ends at: 0x%x\n\0"
str_bss_start:
.ascii "bss section starts at: 0x%x\n\0"
str_bss_end:
.ascii "bss section ends at: 0x%x\n\0"
str_text_start:
.ascii "text section starts at: 0x%x\n\0"
str_text_end:
.ascii "text section ends at: 0x%x\n\0"
str_break:
.ascii "break at: 0x%x\n\0"
end_data:

.section .bss

start_bss:
.lcomm buffer, 500
.lcomm buffer2, 250
end_bss:

.section .text
start_text:

.globl _start
_start:

# print address of start_text label
pushl $start_text
pushl $str_text_start
call printf
addl $8, %esp
# print address of end_text label
pushl $end_text
pushl $str_text_end
call printf
addl $8, %esp
# print address of start_data label
pushl $start_data
pushl $str_data_start
call printf
addl $8, %esp
# print address of end_data label
pushl $end_data
pushl $str_data_end
call printf
addl $8, %esp
# print address of start_bss label
pushl $start_bss
pushl $str_bss_start
call printf
addl $8, %esp
# print address of end_bss label
pushl $end_bss
pushl $str_bss_end
call printf
addl $8, %esp
# get last usable virtual memory address
movl $45, %eax
movl $0, %ebx
int $0x80

incl %eax # system break address
# print system break
pushl %eax
pushl $str_break
call printf
addl $4, %esp

movl $start_text, %ebx

loop:
# print address
pushl %ebx
pushl $str_mem_access
call printf
addl $8, %esp

# access address
# segmentation fault here
movb (%ebx), %dl

incl %ebx

jmp loop

end_loop:
movl $1, %eax
movl $0, %ebx
int $0x80

end_text:

这是输出的相关部分(这是Debian 32bit):

And this the relevant parts of the output (this is Debian 32bit):

text section starts at: 0x8048190
text section ends at: 0x804823b
Data section start at: 0x80492ec
Data section ends at: 0x80493c0
bss section starts at: 0x80493c0
bss section ends at: 0x80493c0
break at: 0x83b4001
Accessing address: 0x8048190
Accessing address: 0x8048191
Accessing address: 0x8048192
[...]
Accessing address: 0x8049fff
Accessing address: 0x804a000
Violación de segmento

我的问题是:

1)为什么我的程序从地址0x8048190而不是0x8048000开始?这样,我想"_start"标签上的指令不是要加载的第一件事,那么地址0x8048000和0x8048190之间是什么?

1) Why is my program starting at address 0x8048190 instead of 0x8048000? With this I guess that the instruction at the "_start" label is not the first thing to load, so what's between the addresses 0x8048000 and 0x8048190?

2)为什么文本部分的末尾与数据部分的开始之间有间隙?

2) Why is there a gap between the end of the text section and the start of the data section?

3)bss的起始地址和结束地址相同.我假设这两个缓冲区存储在其他地方,这是正确的吗?

3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?

4)如果系统断点位于0x83b4001,为什么我会更早地在0x804a000处遇到分段错误?

4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?

推荐答案

我假设您正在使用gcc -m32 -nostartfiles segment-bounds.S或类似版本构建它,因此您拥有32位动态二进制文件. (如果您实际上使用的是32位系统,则不需要-m32,但是大多数要测试的人将具有64位系统.)

I'm assuming you're building this with gcc -m32 -nostartfiles segment-bounds.S or similar, so you have a 32-bit dynamic binary. (You don't need -m32 if you're actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)

我的64位Ubuntu 15.10系统在某些方面为您的程序提供了略有不同的数字,但总体行为方式是相同的. (不同的内核或 ASLR 对此进行了解释.值,例如0x93540010x82a8001)

My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same. (Different kernel, or just ASLR, explains this. The brk address varies wildly, for example, with values like 0x9354001 or 0x82a8001)

1)为什么我的程序从地址0x8048190而不是0x8048000开始?

1) Why is my program starting at address 0x8048190 instead of 0x8048000?

如果您构建静态二进制文件,则您的_start将位于0x8048000.

If you build a static binary, your _start will be at 0x8048000.

我们从readelf -a a.out可以看到0x8048190是.text节的开始.但这并不是映射到页面的文本段的开头. (页面的大小为4096B,Linux要求映射必须在文件位置的4096B边界上对齐,因此采用这种方式布置文件时,execve不可能将_start映射到页面的开头.我认为关闭"列在文件中的位置.)

We can see from readelf -a a.out that 0x8048190 is the start of the .text section. But it isn't at the start of the text segment that's mapped to a page. (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn't be possible for execve to map _start to the start of a page. I think the Off column is position within the file.)

假定.text部分之前的文本段中的其他部分是动态链接器所需的只读数据,因此将其映射到同一页的内存中是很有意义的.

Presumably the other sections in the text segment before the .text section are read-only data that's needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.

## part of readelf -a output
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048114 000114 000013 00   A  0   0  1
  [ 2] .note.gnu.build-i NOTE            08048128 000128 000024 00   A  0   0  4
  [ 3] .gnu.hash         GNU_HASH        0804814c 00014c 000018 04   A  4   0  4
  [ 4] .dynsym           DYNSYM          08048164 000164 000020 10   A  5   1  4
  [ 5] .dynstr           STRTAB          08048184 000184 00001c 00   A  0   0  1
  [ 6] .gnu.version      VERSYM          080481a0 0001a0 000004 02   A  4   0  2
  [ 7] .gnu.version_r    VERNEED         080481a4 0001a4 000020 00   A  5   1  4
  [ 8] .rel.plt          REL             080481c4 0001c4 000008 08  AI  4   9  4
  [ 9] .plt              PROGBITS        080481d0 0001d0 000020 04  AX  0   0 16
  [10] .text             PROGBITS        080481f0 0001f0 0000ad 00  AX  0   0  1         ########## The .text section
  [11] .eh_frame         PROGBITS        080482a0 0002a0 000000 00   A  0   0  4
  [12] .dynamic          DYNAMIC         08049f60 000f60 0000a0 08  WA  5   0  4
  [13] .got.plt          PROGBITS        0804a000 001000 000010 04  WA  0   0  4
  [14] .data             PROGBITS        0804a010 001010 0000d4 00  WA  0   0  1
  [15] .bss              NOBITS          0804a0e8 0010e4 0002f4 00  WA  0   0  8
  [16] .shstrtab         STRTAB          00000000 0010e4 0000a2 00      0   0  1
  [17] .symtab           SYMTAB          00000000 001188 0002b0 10     18  38  4
  [18] .strtab           STRTAB          00000000 001438 000123 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)


2)为什么文本部分的末尾与数据部分的开始之间有间隙?

2) Why is there a gap between the end of the text section and the start of the data section?

为什么不呢?它们必须位于可执行文件的不同段中,因此映射到不同的页面. (文本是只读且可执行的,可以是MAP_SHARED.数据是可读写的,必须是MAP_PRIVATE.BTW,在Linux中,默认值是数据也是可执行的.)

Why not? They have to be in different segments of the executable, so mapped to different pages. (Text is read-only and executable, and can be MAP_SHARED. Data is read-write and has to be MAP_PRIVATE. BTW, in Linux the default is for data to also be executable.)

之间留有空隙,为动态链接程序腾出空间来将共享库的文本段映射到可执行文件的文本旁边.这也意味着到数据部分的越界数组索引更有可能发生段错误. (早期和嘈杂的故障总是更容易调试).

Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable. It also means an out-of-bounds array index into the data section is more likely to segfault. (Earlier and noisier failure is always easier to debug).

3)bss的起始地址和结束地址相同.我假设这两个缓冲区存储在其他地方,对吗?

3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?

这很有趣.它们在bss中,但是IDK为什么当前位置不受.lcomm标签影响.由于您使用的是.lcomm而不是.comm,因此在链接之前它们可能位于不同的子节中.如果我使用.skip .zero 来保留空间,我得到了您期望的结果:

That's interesting. They're in the bss, but IDK why the current position isn't affected by .lcomm labels. Probably they go in a different subsection before linking, since you used .lcomm instead of .comm. If I use use .skip or .zero to reserve space, I get the results you expected:

.section .bss
start_bss:
#.lcomm buffer, 500
#.lcomm buffer2, 250
buffer:  .skip 500
buffer2: .skip 250
end_bss:

.lcomm 会将内容放入BSS,即使您不这样做切换到该部分.即它不在乎当前部分是什么,也可能不在乎或影响.bss部分中的当前位置. TL:DR:当您手动切换到.bss时,请使用.zero.skip,而不是.comm.lcomm.

.lcomm puts things in the BSS even if you don't switch to that section. i.e. it doesn't care what the current section is, and maybe doesn't care about or affect what the current position in the .bss section is. TL:DR: when you switch to the .bss manually, use .zero or .skip, not .comm or .lcomm.

4)如果系统断点位于0x83b4001,为什么我会更早地在0x804a000处遇到分段错误?

4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?

这告诉我们在文本段和brk之间存在未映射的页面. (您的循环以ebx = $start_text开始,因此它在文本段之后的第一个未映射页面上的处出错).除了文本和数据之间的虚拟地址空间中的漏洞外,数据段之外还可能还有其他漏洞.

That tells us that there are unmapped pages between the text segment and the brk. (Your loop starts with ebx = $start_text, so it faults at the on the first unmapped page after the text segment). Besides the hole in virtual address space between text and data, there's probably also other holes beyond the data segment.

内存保护具有页面粒度(4096B),因此出现故障的第一个地址将始终是页面的第一个字节.

Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.

这篇关于关于Linux中程序的内存布局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆