x86汇编指针 [英] x86 Assembly pointers
问题描述
希望这不是一个愚蠢的问题,但是我试图将我的想法围绕汇编中的指针.
两者之间到底有什么区别?
mov eax, ebx
和
mov [eax], ebx
以及何时应使用dword ptr [eax]
?
还尝试执行mov eax, [ebx]
时出现编译错误,为什么会这样?
如前所述,用括号括住操作数意味着该操作数要被取消引用,就好像它是一个指针一样.换句话说,括号表示您正在从内存位置读取 值(或将值存储到中),而不是直接读取该值. /p>
所以,这个:
mov eax, ebx
只需将ebx
中的值复制到eax
中.用伪C表示法是:eax = ebx
.
鉴于此:
mov eax, [ebx]
取消引用ebx
的内容,并将指向的值存储在eax
中.用伪C表示法是:eax = *ebx
.
最后,这个:
mov [eax], ebx
将ebx
中的值存储到eax
指向的存储位置中.同样,使用伪C表示法:*eax = ebx
.
这里的寄存器也可以用内存操作数代替,例如符号变量名.因此:
mov eax, [myVar]
取消引用变量myVar
的地址,并将该变量的 contents 存储在eax
中,例如eax = myVar
.
通过对比,
mov eax, myVar
将变量myVar
的地址存储到eax
中,就像eax = &myVar
一样.
至少,这是大多数 汇编程序的工作方式. Microsoft的汇编器(称为MASM)和Microsoft C/C ++编译器的内联汇编有些不同.它将以上两条指令视为等效指令,实质上是忽略内存操作数周围的括号.
要获取MASM中变量的地址,可以使用OFFSET
关键字:
mov eax, OFFSET myVar
但是,即使MASM具有这种宽容的语法并允许您变得草率,但您不应该这样做.要取消引用变量并获取其实际值时,请始终包括方括号.如果使用正确的语法显式地编写代码,您永远不会得到错误的结果,并且它会使其他人更容易理解.另外,这将迫使您养成按其他汇编程序期望的方式编写代码的习惯,而不是依靠MASM的按我的意思,而不是按我写的"拐杖来编写代码.
说到按我的意思,而不是我写的东西"这样的拐杖,MASM通常还使您可以省去操作数大小说明符,因为它知道变量的大小.但是再次,我建议编写它以保持清晰和一致.因此,如果myVar
是int
,则可以这样做:
mov eax, DWORD PTR [myVar] ; eax = myVar
或
mov DWORD PTR [myVar], eax ; myVar = eax
此表示法是 必要在其他诸如NASM 的汇编程序中,它们不是强类型的,并且不记得myVar
是DWORD
大小的内存位置.
在取消引用寄存器操作数时,根本不需要此操作,因为寄存器的名称指示其大小. al
和ah
总是BYTE
大小,ax
总是WORD
大小,eax
总是DWORD
大小,并且rax
总是QWORD
大小.但是,如果愿意,无论如何也可以包含它,以与标记内存操作数的方式保持一致.
还尝试执行
mov eax, [ebx]
时出现编译错误,为什么会这样?
嗯……你不应该.在MSVC的内联汇编中,这对我来说很好.正如我们已经看到的,它等效于:
mov eax, DWORD PTR [ebx]
,这意味着将取消引用ebx
指向的内存位置,并且将DWORD
大小的值加载到eax
中.
为什么我不能做
mov a, [eax]
那不应该使"a"指向eax所指向的位置的指针吗?
不.不允许这种操作数的组合.从 MOV
指令的文档中可以看到,基本上有五种可能性(忽略其他编码和句段):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
请注意,您尝试的是mov memory, memory
.
但是,您可以通过简单的编码使a
指向eax
指向的内容:
mov DWORD PTR [a], eax
现在a
和eax
具有相同的值.如果eax
是指针,则a
现在是指向相同内存位置的指针.
如果要将a
设置为eax
指向的值,则需要执行以下操作:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
当然,这会阻塞指针并将其替换为取消引用的值.如果不想丢失指针,则必须使用第二个临时"寄存器.像这样:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
我意识到这一切都令人困惑.在x86 ISA中,mov
指令被大量潜在的含义所重载.这是由于x86起源于CISC体系结构.相比之下,现代RISC体系结构在分离寄存器-寄存器移动,存储器负载和存储器存储方面做得更好. x86将它们全部塞进一条mov
指令中.现在回去修复它为时已晚.您只需要对语法感到满意,有时便需要重新看一眼.
hope this isnt a dumb question, but I am trying to wrap my mind around pointers in assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax], ebx
and when should dword ptr [eax]
should be used?
Also when I try to do mov eax, [ebx]
I get a compile error, why is this?
As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.
So, this:
mov eax, ebx
simply copies the value in ebx
into eax
. In a pseudo-C notation, this would be: eax = ebx
.
Whereas this:
mov eax, [ebx]
dereferences the contents of ebx
and stores the pointed-to value in eax
. In a pseudo-C notation, this would be: eax = *ebx
.
Finally, this:
mov [eax], ebx
stores the value in ebx
into the memory location pointed to by eax
. Again, in pseudo-C notation: *eax = ebx
.
The registers here could also be replaced with memory operands, such as symbolic variable names. So this:
mov eax, [myVar]
dereferences the address of the variable myVar
and stores the contents of that variable in eax
, like eax = myVar
.
By contrast, this:
mov eax, myVar
stores the address of the variable myVar
into eax
, like eax = &myVar
.
At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.
To get the address of a variable in MASM, you would use the OFFSET
keyword:
mov eax, OFFSET myVar
However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.
Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar
is an int
, you would do:
mov eax, DWORD PTR [myVar] ; eax = myVar
or
mov DWORD PTR [myVar], eax ; myVar = eax
This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar
is a DWORD
-sized memory location.
You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al
and ah
are always BYTE
-sized, ax
is always WORD
-sized, eax
is always DWORD
-sized, and rax
is always QWORD
-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.
Also when I try to do
mov eax, [ebx]
I get a compile error, why is this?
Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:
mov eax, DWORD PTR [ebx]
and means that the memory location pointed to by ebx
will be dereferenced and that DWORD
-sized value will be loaded into eax
.
why I cant do
mov a, [eax]
Should that not make "a" a pointer to wherever eax is pointing?
No. This combination of operands is not allowed. As you can see from the documentation for the MOV
instruction, there are essentially five possibilities (ignoring alternate encodings and segments):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
Notice that there is no mov memory, memory
, which is what you were trying.
However, you can make a
point to what eax
is pointing to by simply coding:
mov DWORD PTR [a], eax
Now a
and eax
have the same value. If eax
was a pointer, then a
is now a pointer to that same memory location.
If you want to set a
to the value that eax
is pointing to, then you will need to do:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
I realize this is all somewhat confusing. The mov
instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov
instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.
这篇关于x86汇编指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!