无法理解寄存器和变量之间的汇编 mov 指令 [英] Can't understand assembly mov instruction between register and a variable

查看:97
本文介绍了无法理解寄存器和变量之间的汇编 mov 指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 64 位 linux 上使用 NASM 汇编程序.有一些我无法理解的变量和寄存器.我创建了一个名为msg"的变量:

I am using NASM assembler on linux 64 bit. There is something with variables and registers I can't understand. I create a variable named "msg":

 msg db "hello, world"  

现在,当我想写入标准输出时,我将 msg 移动到 rsi 寄存器,但是我不理解 mov 指令按位... rsi 寄存器由 64 位组成,而 msg 变量有 12 个符号,每个符号为 8 位,这意味着 msg 变量的大小为 12 * 8 位,大于显然是 64 位.

Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.

那么这怎么可能做出这样的指令:
mov rsi, msg ,不会溢出为 rsi 分配的内存.

So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.

或者 rsi 寄存器是否包含字符串的第一个符号的内存位置,在写入 1 个符号后,它更改为下一个符号的内存位置?

Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?

对不起,如果我写的完全是废话,我是组装新手,我暂时无法掌握它.

Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.

推荐答案

在 NASM 语法中(与 MASM 语法不同)mov rsi, symbol 将符号的地址放入相对强弱指数.(使用 64 位绝对立即数效率低下;改用 RIP 相对 LEA 或 mov esi, symbol.如何在 GNU 汇编器中将函数或标签的地址加载到寄存器中)

In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)

mov rsi, [symbol] 将加载从 symbol 开始的 8 个字节.当您编写这样的指令时,您可以选择一个有用的位置来加载 8 个字节.

mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.

mov   rsi,  msg           ; rsi  = address of msg.  Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1]   ; rax  = 'e' (upper 7 bytes zeroed)
mov   edx, [msg+6]        ; rdx  = ' wor' (upper 4 bytes zeroed)

请注意,您可以使用 mov esi, msg 因为符号地址总是适合 32 位(在默认的小"代码模型中,所有静态代码/数据都在虚拟的低 2GB 中地址空间).NASM 使用汇编时常量(如 mov rax, 1)为您进行了这种优化,但可能无法使用链接时常量.为什么 x86-32 位寄存器上的 64 条指令将整个 64 位寄存器的上半部分归零?

Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

在写入 1 个符号后,它会更改为下一个符号的内存位置吗?

and after writing 1 symbol it changes to the memory location of the next symbol?

不,如果你想要,你必须inc rsi.没有魔法.指针只是可以像任何其他整数一样操作的整数,而字符串只是内存中的字节.

No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.

访问寄存器不会神奇地修改它们.

Accessing registers doesn't magically modify them.

有像 lodsbpop 这样的指令,它们从内存中加载并增加一个指针(rsirsp分别),但 x86 没有任何前/后自增/自减寻址模式,因此即使您需要 mov 也无法获得该行为.使用 add/subinc/dec.

There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.

这篇关于无法理解寄存器和变量之间的汇编 mov 指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆