为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器 [英] Why NASM on Linux changes registers in x86_64 assembly
问题描述
我是 x86_64 汇编编程的新手.我正在用 x86_64 程序集编写简单的Hello World"程序.下面是我的代码,它运行得很好.
I am new to x86_64 assembly programming. I was writing simple "Hello World" program in x86_64 assembly. Below is my code, which runs perfectly fine.
global _start
section .data
msg: db "Hello to the world of SLAE64", 0x0a
mlen equ $-msg
section .text
_start:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, mlen
syscall
mov rax, 60
mov rdi, 4
syscall
现在当我在 gdb 中反汇编时,它给出以下输出:
Now when I disassemble in gdb, it gives below output:
(gdb) disas
Dump of assembler code for function _start:
=> 0x00000000004000b0 <+0>: mov eax,0x1
0x00000000004000b5 <+5>: mov edi,0x1
0x00000000004000ba <+10>: movabs rsi,0x6000d8
0x00000000004000c4 <+20>: mov edx,0x1d
0x00000000004000c9 <+25>: syscall
0x00000000004000cb <+27>: mov eax,0x3c
0x00000000004000d0 <+32>: mov edi,0x4
0x00000000004000d5 <+37>: syscall
End of assembler dump.
我的问题是为什么 NASM 会有这样的行为?我知道它会根据操作码更改指令,但我不确定寄存器的行为是否相同.
My question is why NASM behaves in such way? I know it changes instructions based on opcode, but I am not sure about same behaviour with registers.
这种行为也会影响可执行文件的功能吗?
Also does this behaviour affects functionality of executable?
我在 i5 处理器上使用 VMware 中安装的 Ubuntu 16.04(64 位).
I am using Ubuntu 16.04 (64 bit) installed in VMware on i5 processor.
提前致谢.
推荐答案
在 64 位模式下 mov eax, 1
将清除 rax
寄存器的上半部分(请参阅此处解释一下)因此 mov eax, 1
在语义上等同于 mov rax, 1
.
In 64-bit mode mov eax, 1
will clear the upper part of the rax
register (see here for an explanation) thus mov eax, 1
is semantically equivalent to mov rax, 1
.
前者不过保留了一个REX.W(48h
数字)前缀(一个字节,用于指定x86-64引入的寄存器),操作码是一样的对于两条指令(0b8h
后跟 DWORD 或 QWORD).
所以汇编器继续并选择最短的形式.
The former however spare a REX.W (48h
numerically) prefix (a byte necessary to specify the registers introduced with x86-64), the opcode is the same for both instructions (0b8h
followed by a DWORD or a QWORD).
So the assembler goes ahead and picks up the shortest form.
这是 NASM 的典型行为,请参阅 第 3.3 节NASM 手册的示例,其中 [eax*2]
的示例被组装为 [eax+eax]
以在 disp32
字段之后保留 disp32
字段em>SIB 字节1 ([eax*2]
只能编码为 [eax*2+disp32]
其中汇编程序将 disp32
设置为 0).
This is a typical behavior of NASM, see Section 3.3 of the NASM's manual where the example of [eax*2]
is assembled as [eax+eax]
to spare the disp32
field after the SIB byte1 ([eax*2]
is only encodable as [eax*2+disp32]
where the assembler set disp32
to 0).
我无法强制 NASM 发出真正的 mov rax, 1
指令(即 48 B8 01 00 00 00 00 00 00 00
),即使给指令加上前缀使用 o64
.
如果需要真正的 mov rax, 1
(这不是您的情况),则必须使用 db
和类似工具手动组装它.
I was unable to force NASM to emit a real mov rax, 1
instruction (i.e. 48 B8 01 00 00 00 00 00 00 00
) even by prefixing the instruction with o64
.
If a real mov rax, 1
is needed (this is not your case), one must resort to assembling it manually with db
and similar.
编辑:Peter Cordes 的回答表明,事实上,有一种方法告诉 NASM 不要使用 strict
修饰符.mov rax, STRICT 1
产生指令的 10 字节版本 (mov r64, imm64
) 而 mov rax, STRICT DWORD 1
产生一个7 字节版本(mov r64, imm32
,其中 imm32
在使用前被符号扩展).
EDIT: Peter Cordes' answer shows that there is, in fact, a way to tell NASM not to optimize an instruction with the strict
modifier.
mov rax, STRICT 1
produces the 10-byte version of the instruction (mov r64, imm64
) while mov rax, STRICT DWORD 1
produces a 7-byte version (mov r64, imm32
where imm32
is sign-extended before use).
旁注:最好使用 RIP- 相对寻址,这避免了 64 位立即数(从而减少了代码大小)并且是 在 MacOS 中是必需的(以防万一).
将 mov esi, msg
更改为 lea esi, [REL msg]
(RIP 相对是一种寻址模式,因此它需要一个寻址",方括号,为了避免从该地址读取,我们使用 lea
只计算有效地址但不访问).
您可以使用指令 DEFAULT REL
来避免在每次内存访问中键入 REL
.
Side note: It's better to use the RIP-relative addressing, this avoids 64-bit immediate constants (thus reducing code size) and is mandatory in MacOS (in case you cared).
Change the mov esi, msg
to lea esi, [REL msg]
(RIP-relative is an addressing mode so it needs an "addressing", the square bracket, to avoid reading from that address we use lea
that only computes the effective address but does no access).
You can use the directive DEFAULT REL
to avoid typing REL
in each memory access.
我的印象是 Mach-O 文件格式需要 PIC 代码但是 这可能不会就是这样.
I was under the impression that the Mach-O file format required PIC code but this may not be the case.
1 Scale Index Base 字节,用于编码当时采用 32 位模式引入的新寻址模式.
1 The Scale Index Base byte, used to encode the new addressing mode introduced back then with the 32-bit mode.
这篇关于为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!