无法理解寄存器和变量之间的汇编 mov 指令 [英] Can't understand assembly mov instruction between register and a variable
问题描述
我在 64 位 linux 上使用 NASM 汇编程序.有一些我无法理解的变量和寄存器.我创建了一个名为msg"的变量:
I am using NASM assembler on linux 64 bit. There is something with variables and registers I can't understand. I create a variable named "msg":
msg db "hello, world"
现在,当我想写入标准输出时,我将 msg
移动到 rsi
寄存器,但是我不理解 mov
指令按位... rsi 寄存器由 64 位组成,而 msg 变量有 12 个符号,每个符号为 8 位,这意味着 msg 变量的大小为 12 * 8
位,大于显然是 64 位.
Now when I want to write to the stdout I move the msg
to rsi
register, however I don't understand the mov
instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8
bits , which is greater than 64 bits obviously.
那么这怎么可能做出这样的指令:mov rsi, msg
,不会溢出为 rsi 分配的内存.
So how is this even possible to make an instruction like:
mov rsi, msg
, without overflowing the memory allocated for rsi.
或者 rsi 寄存器是否包含字符串的第一个符号的内存位置,在写入 1 个符号后,它更改为下一个符号的内存位置?
Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?
对不起,如果我写的完全是废话,我是组装新手,我暂时无法掌握它.
Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.
推荐答案
在 NASM 语法中(与 MASM 语法不同)mov rsi, symbol
将符号的地址放入相对强弱指数.(使用 64 位绝对立即数效率低下;改用 RIP 相对 LEA 或 mov esi, symbol
.如何在 GNU 汇编器中将函数或标签的地址加载到寄存器中)
In NASM syntax (unlike MASM syntax) mov rsi, symbol
puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol
instead. How to load address of function or label into register in GNU Assembler)
mov rsi, [symbol]
将加载从 symbol
开始的 8 个字节.当您编写这样的指令时,您可以选择一个有用的位置来加载 8 个字节.
mov rsi, [symbol]
would load 8 bytes starting at symbol
. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.
mov rsi, msg ; rsi = address of msg. Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1] ; rax = 'e' (upper 7 bytes zeroed)
mov edx, [msg+6] ; rdx = ' wor' (upper 4 bytes zeroed)
请注意,您可以使用 mov esi, msg
因为符号地址总是适合 32 位(在默认的小"代码模型中,所有静态代码/数据都在虚拟的低 2GB 中地址空间).NASM 使用汇编时常量(如 mov rax, 1
)为您进行了这种优化,但可能无法使用链接时常量.为什么 x86-32 位寄存器上的 64 条指令将整个 64 位寄存器的上半部分归零?
Note that you can use mov esi, msg
because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1
), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
在写入 1 个符号后,它会更改为下一个符号的内存位置吗?
and after writing 1 symbol it changes to the memory location of the next symbol?
不,如果你想要,你必须inc rsi
.没有魔法.指针只是可以像任何其他整数一样操作的整数,而字符串只是内存中的字节.
No, if you want that you have to inc rsi
. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.
访问寄存器不会神奇地修改它们.
Accessing registers doesn't magically modify them.
有像 lodsb
和 pop
这样的指令,它们从内存中加载并增加一个指针(rsi
或 rsp
分别),但 x86 没有任何前/后自增/自减寻址模式,因此即使您需要 mov
也无法获得该行为.使用 add
/sub
或 inc
/dec
.
There are instructions like lodsb
and pop
that load from memory and increment a pointer (rsi
or rsp
respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov
even if you want it. Use add
/sub
or inc
/dec
.
这篇关于无法理解寄存器和变量之间的汇编 mov 指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!