为什么ELF执行入口虚拟地址的格式为0x80xxxxx,而不是零0x0? [英] Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?

查看:580
本文介绍了为什么ELF执行入口虚拟地址的格式为0x80xxxxx,而不是零0x0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行后,程序将从虚拟地址0x80482c0开始运行.该地址不是指向我们的main()过程,而是指向由链接程序创建的名为_start的过程.

到目前为止,我对Google的研究只是让我想到了一些(模糊的)历史推测,例如:

在* NIX的端口上,有一个传说传说0x08048000曾经是STACK_TOP(也就是说,堆栈从接近0x08048000朝0向下增长)到i386,这是一组来自加利福尼亚州圣克鲁斯的颁布的.那时128MB的RAM昂贵,而4GB的RAM则不可思议.

任何人都可以确认/否认吗?

正如Mads所指出的,为了通过空指针捕获大多数访问,类Unix的系统倾向于使地址为零的页面未映射".因此,访问会立即触发CPU异常,即段错误.这比让应用程序变得无赖更好.但是,异常向量表可以位于任何地址,至少在x86处理器上(该地址有一个特殊的寄存器,该寄存器中装有lidt操作码).

起点地址是一组约定的一部分,这些约定描述了内存的布局方式.链接程序在生成可执行二进制文件时,必须知道这些约定,因此它们不太可能更改.基本上,对于Linux,内存布局约定是从90年代初期的Linux的第一个版本继承而来的.一个进程必须可以访问多个区域:

  • 代码必须在包含起点的范围内.
  • 必须有一个堆栈.
  • 必须有一个堆,其限制随brk()sbrk()系统调用而增加.
  • mmap()系统调用必须有一定的空间,包括共享库的加载.

如今,malloc()所在的堆由mmap()调用支持,该调用在内核认为合适的任何地址获取内存块.但是在较早的时期,Linux就像以前的Unix系统一样,它的堆需要在一个不间断的块中占用很大的空间,这可能会随着地址的增加而增加.因此,无论采用哪种约定,它都必须将代码填充并朝低地址进行堆栈,并在给定点之后将地址空间的每个块都分配给堆.

但是也有堆栈,堆栈通常很小,但是在某些情况下可能会急剧增加.堆栈逐渐变小,当堆栈已满时,我们确实希望进程可以预见地崩溃而不是覆盖某些数据.因此,堆栈必须有很大的区域,在该区域的低端有一个未映射的页面.瞧!在地址零处有一个未映射的页面,以捕获空指针取消引用.因此,定义了除第一页外,堆栈将获得前128 MB的地址空间.这意味着代码必须在那些128 MB之后,地址类似于0x080xxxxx.

正如迈克尔指出的那样,丢失" 128 MB的地址空间没什么大不了的,因为就实际使用的地址而言,该地址空间非常大.当时,Linux内核将单个进程的地址空间限制为1 GB,超过了硬件允许的最大4 GB,这并不是一个大问题.

When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.

My Google research so far just led me to some (vague) historical speculations like this:

There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.

Can anyone confirm/deny this?

解决方案

As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).

The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:

  • The code must be in a range which includes the starting point.
  • There must be a stack.
  • There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
  • There must be some room for mmap() system calls, including shared library loading.

Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.

But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.

As Michael points out, "loosing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.

这篇关于为什么ELF执行入口虚拟地址的格式为0x80xxxxx,而不是零0x0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆