为什么 x86-64 中的全局变量是相对于指令指针访问的? [英] Why are global variables in x86-64 accessed relative to the instruction pointer?

查看:20
本文介绍了为什么 x86-64 中的全局变量是相对于指令指针访问的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 gcc -S -fasm foo.c 将 c 代码编译为汇编代码.c代码在main函数中声明全局变量和变量如下图:

I have tried to compile c code to assembly code using gcc -S -fasm foo.c. The c code declare global variable and variable in the main function as shown below:

int y=6;
int main()
{
        int x=4;
        x=x+y;
        return 0;
}

现在我查看了从这个 C 代码生成的汇编代码,我看到全局变量 y 是使用 rip 指令指针的值存储的.

now I looked in the assembly code that has been generated from this C code and I saw, that the global variable y is stored using the value of the rip instruction pointer.

我认为只有 const 全局变量存储在文本段中,但是,看这个示例,似乎常规全局变量也存储在文本段中,这很奇怪.

I thought that only const global variable stored in the text segment but, looking at this example it seems that also regular global variables are stored in the text segment which is very weird.

我想我所做的某些假设是错误的,所以有人可以向我解释一下吗?

I guess that some assumption i made is wrong, so can someone please explain it to me?

c编译器生成的汇编代码:

the assembly code generated by c compiler:

        .file   "foo.c"
        .text
        .globl  y
        .data
        .align 4
        .type   y, @object
        .size   y, 4
y:
        .long   6
        .text
        .globl  main
        .type   main, @function

main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $4, -4(%rbp)
        movl    y(%rip), %eax
        addl    %eax, -4(%rbp)
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:

推荐答案

可执行文件不同部分之间的偏移量是链接时常量,因此 RIP 相对寻址可用于任何部分(包括.dataconst 全局变量所在的位置).注意 asm 输出中的 .data.

The offsets between different sections of your executable are link-time constants, so RIP-relative addressing is usable for any section (including .data where your non-const globals are). Note the .data in your asm output.

这甚至适用于 PIE 可执行文件或共享库,其中绝对地址在运行时 (ASLR) 之前已知.

This applies even in a PIE executable or shared library, where the absolute addresses are not known until runtime (ASLR).

位置无关可执行文件 (PIE) 的运行时 ASLR 为整个程序随机化一个基地址,而不是相对于彼此的单个段起始地址.

Runtime ASLR for position-independent executables (PIE) randomizes one base address for the entire program, not individual segment start addresses relative to each other.

对静态变量的所有访问都使用 RIP 相对寻址,因为这是最有效的,即使是在可以选择绝对寻址的位置相关可执行文件中(因为静态代码/数据的绝对地址是链接-时间常数,不能通过动态链接重新定位).

All access to static variables uses RIP-relative addressing because that's most efficient, even in a position-dependent executable where absolute addressing is an option (because absolute addresses of static code/data are link-time constants, not relocated by dynamic linking).

相关并且可能重复:

在 32 位 x86 中,有 2 种冗余方式来编码没有寄存器的寻址模式和 disp32 绝对地址.(有和没有 SIB 字节).x86-64 将较短的重新用作 RIP+rel32,所以 mov foo, %eaxmov foo(%rip), %eax<长 1 个字节/代码>.

In 32-bit x86, there are 2 redundant ways to encode an addressing mode with no registers and a disp32 absolute address. (With and without a SIB byte). x86-64 repurposed the shorter one as RIP+rel32, so mov foo, %eax is 1 byte longer than mov foo(%rip), %eax.

64 位绝对寻址会占用更多空间,并且只能用于 mov 到/从 RAX/EAX/AX/AL 除非您使用单独的指令将地址放入寄存器首先.

64-bit absolute addressing would take even more space, and is only available for mov to/from RAX/EAX/AX/AL unless you use a separate instruction to get the address into a register first.

(在 x86-64 Linux PIE/PIC 中,64 位绝对寻址是允许的,并通过加载时修复来处理以将正确的地址放入代码或跳转表或静态初始化的函数指针中.因此代码不从技术上讲,必须与位置无关,但通常这样做更有效率.并且不允许使用 32 位绝对寻址,因为 ASLR 不限于虚拟地址空间的低 31 位.)

(In x86-64 Linux PIE/PIC, 64-bit absolute addressing is allowed, and handled via load-time fixups to put the right address into the code or jump table or statically-initialized function pointer. So code doesn't technically have to be position-independent, but normally it's more efficient to be. And 32-bit absolute addressing isn't allowed, because ASLR isn't limited to the low 31 bits of virtual address space.)

请注意,在非 PIE Linux 可执行文件中,gcc 将使用 32 位绝对寻址将静态数据的地址放入寄存器.例如puts("hello"); 通常编译为

Note that in a non-PIE Linux executable, gcc will use 32-bit absolute addressing for putting the address of static data in a register. e.g. puts("hello"); will typically compile as

mov   $.LC0, %edi     # mov r32, imm32
call  puts

在默认的非 PIE 内存模型中,静态代码和数据链接到虚拟地址空间的低 32 位,因此 32 位绝对地址无论是零扩展还是符号扩展到 64 位都有效.这对于索引静态数组也很方便,例如 mov array(%rax), %edx例如添加 $4, %eax.

In the default non-PIE memory model, static code and data get linked into the low 32 bits of virtual address space, so 32-bit absolute addresses work whether they're zero- or sign-extended to 64-bit. This is handy for indexing static arrays, too, like mov array(%rax), %edx ; add $4, %eax for example.

参见 32 位绝对地址 否x86-64 Linux 中不再允许使用? 了解更多关于 PIE 可执行文件的信息,这些可执行文件对所有内容使用位置无关代码,包括 RIP 相关的 LEA,如 7 字节 lea .LC0(%rip), %rdi 而不是 5 字节的 mov $.LC0, %edi.请参阅如何加载函数地址或标签到寄存器

See 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables, which use position-independent code for everything, including RIP-relative LEA like 7-byte lea .LC0(%rip), %rdi instead of 5-byte mov $.LC0, %edi. See How to load address of function or label into register

我提到 Linux 是因为它从 .cfi 指令中看起来就像您正在为非 Windows 平台编译一样.

I mention Linux because it looks from the .cfi directives like you're compiling for a non-Windows platform.

这篇关于为什么 x86-64 中的全局变量是相对于指令指针访问的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆