为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器 [英] Why NASM on Linux changes registers in x86_64 assembly

查看：17 发布时间：2021/12/18 9:04:33 assembly nasm x86-64 micro-optimization shellcode

本文介绍了为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 x86_64 汇编编程的新手.我正在用 x86_64 程序集编写简单的Hello World"程序.下面是我的代码，它运行得很好.

I am new to x86_64 assembly programming. I was writing simple "Hello World" program in x86_64 assembly. Below is my code, which runs perfectly fine.

global _start

section .data

    msg: db "Hello to the world of SLAE64", 0x0a
    mlen equ $-msg

section .text
    _start:
            mov rax, 1
            mov rdi, 1
            mov rsi, msg
            mov rdx, mlen
            syscall

            mov rax, 60
            mov rdi, 4
            syscall

现在当我在 gdb 中反汇编时，它给出以下输出:

Now when I disassemble in gdb, it gives below output:

(gdb) disas
Dump of assembler code for function _start:
=> 0x00000000004000b0 <+0>:     mov    eax,0x1
   0x00000000004000b5 <+5>:     mov    edi,0x1
   0x00000000004000ba <+10>:    movabs rsi,0x6000d8
   0x00000000004000c4 <+20>:    mov    edx,0x1d
   0x00000000004000c9 <+25>:    syscall
   0x00000000004000cb <+27>:    mov    eax,0x3c
   0x00000000004000d0 <+32>:    mov    edi,0x4
   0x00000000004000d5 <+37>:    syscall
End of assembler dump.

我的问题是为什么 NASM 会有这样的行为?我知道它会根据操作码更改指令，但我不确定寄存器的行为是否相同.

My question is why NASM behaves in such way? I know it changes instructions based on opcode, but I am not sure about same behaviour with registers.

这种行为也会影响可执行文件的功能吗?

Also does this behaviour affects functionality of executable?

我在 i5 处理器上使用 VMware 中安装的 Ubuntu 16.04(64 位).

I am using Ubuntu 16.04 (64 bit) installed in VMware on i5 processor.

提前致谢.

推荐答案

在 64 位模式下 mov eax, 1 将清除 rax 寄存器的上半部分(请参阅此处解释一下)因此 mov eax, 1 在语义上等同于 mov rax, 1.

In 64-bit mode mov eax, 1 will clear the upper part of the rax register (see here for an explanation) thus mov eax, 1 is semantically equivalent to mov rax, 1.

前者不过保留了一个REX.W(48h 数字)前缀(一个字节，用于指定x86-64引入的寄存器)，操作码是一样的对于两条指令(0b8h 后跟 DWORD 或 QWORD).
所以汇编器继续并选择最短的形式.

The former however spare a REX.W (48h numerically) prefix (a byte necessary to specify the registers introduced with x86-64), the opcode is the same for both instructions (0b8h followed by a DWORD or a QWORD).
So the assembler goes ahead and picks up the shortest form.

这是 NASM 的典型行为，请参阅第 3.3 节NASM 手册的示例，其中 [eax*2] 的示例被组装为 [eax+eax] 以在 disp32 字段之后保留 disp32 字段em>SIB 字节¹ ([eax*2] 只能编码为 [eax*2+disp32] 其中汇编程序将 disp32 设置为 0).

This is a typical behavior of NASM, see Section 3.3 of the NASM's manual where the example of [eax*2] is assembled as [eax+eax] to spare the disp32 field after the SIB byte¹ ([eax*2] is only encodable as [eax*2+disp32] where the assembler set disp32 to 0).

我无法强制 NASM 发出真正的 mov rax, 1 指令(即 48 B8 01 00 00 00 00 00 00 00)，即使给指令加上前缀使用 o64.
如果需要真正的 mov rax, 1(这不是您的情况)，则必须使用 db 和类似工具手动组装它.

I was unable to force NASM to emit a real mov rax, 1 instruction (i.e. 48 B8 01 00 00 00 00 00 00 00) even by prefixing the instruction with o64.
If a real mov rax, 1 is needed (this is not your case), one must resort to assembling it manually with db and similar.

编辑:Peter Cordes 的回答表明，事实上，有一种方法告诉 NASM 不要使用 strict 修饰符.
mov rax, STRICT 1 产生指令的 10 字节版本 (mov r64, imm64) 而 mov rax, STRICT DWORD 1 产生一个7 字节版本(mov r64, imm32，其中 imm32 在使用前被符号扩展).

EDIT: Peter Cordes' answer shows that there is, in fact, a way to tell NASM not to optimize an instruction with the strict modifier.
mov rax, STRICT 1 produces the 10-byte version of the instruction (mov r64, imm64) while mov rax, STRICT DWORD 1 produces a 7-byte version (mov r64, imm32 where imm32 is sign-extended before use).

旁注:最好使用 RIP- 相对寻址，这避免了 64 位立即数(从而减少了代码大小)并且是在 MacOS 中是必需的(以防万一).
将 mov esi, msg 更改为 lea esi, [REL msg](RIP 相对是一种寻址模式，因此它需要一个寻址"，方括号，为了避免从该地址读取，我们使用 lea 只计算有效地址但不访问).
您可以使用指令 DEFAULT REL 来避免在每次内存访问中键入 REL.

Side note: It's better to use the RIP-relative addressing, this avoids 64-bit immediate constants (thus reducing code size) and is mandatory in MacOS (in case you cared).
Change the mov esi, msg to lea esi, [REL msg] (RIP-relative is an addressing mode so it needs an "addressing", the square bracket, to avoid reading from that address we use lea that only computes the effective address but does no access).
You can use the directive DEFAULT REL to avoid typing REL in each memory access.

我的印象是 Mach-O 文件格式需要 PIC 代码但是这可能不会就是这样.

I was under the impression that the Mach-O file format required PIC code but this may not be the case.

¹ Scale Index Base 字节，用于编码当时采用 32 位模式引入的新寻址模式.

¹ The Scale Index Base byte, used to encode the new addressing mode introduced back then with the 32-bit mode.

这篇关于为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器 [英] Why NASM on Linux changes registers in x86_64 assembly

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么 Linux 上的 NASM 会更改 x86_64 程序集中的寄存器 [英] Why NASM on Linux changes registers in x86_64 assembly

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭