x86汇编指针 [英] x86 Assembly pointers

查看:114
本文介绍了x86汇编指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望这不是一个愚蠢的问题,但是我试图将我的想法围绕汇编中的指针.

两者之间到底有什么区别?

mov eax, ebx

mov [eax], ebx

以及何时应使用dword ptr [eax]?

还尝试执行mov eax, [ebx]时出现编译错误,为什么会这样?

解决方案

如前所述,用括号括住操作数意味着该操作数要被取消引用,就好像它是一个指针一样.换句话说,括号表示您正在从内存位置读取 值(或将值存储到中),而不是直接读取该值. /p>

所以,这个:

mov  eax, ebx

只需将ebx中的值复制到eax中.用伪C表示法是:eax = ebx.

鉴于此:

mov  eax, [ebx]

取消引用ebx的内容,并将指向的值存储在eax中.用伪C表示法是:eax = *ebx.

最后,这个:

mov  [eax], ebx

ebx中的值存储到eax指向的存储位置中.同样,使用伪C表示法:*eax = ebx.


这里的寄存器也可以用内存操作数代替,例如符号变量名.因此:

mov  eax, [myVar]

取消引用变量myVar的地址,并将该变量的 contents 存储在eax中,例如eax = myVar.

通过对比,

mov  eax, myVar

将变量myVar地址存储到eax中,就像eax = &myVar一样.

至少,这是大多数 汇编程序的工作方式. Microsoft的汇编器(称为MASM)和Microsoft C/C ++编译器的内联汇编有些不同.它将以上两条指令视为等效指令,实质上是忽略内存操作数周围的括号.

要获取MASM中变量的地址,可以使用OFFSET关键字:

mov  eax, OFFSET myVar

但是,即使MASM具有这种宽容的语法并允许您变得草率,但您不应该这样做.要取消引用变量并获取其实际值时,请始终包括方括号.如果使用正确的语法显式地编写代码,您永远不会得到错误的结果,并且它会使其他人更容易理解.另外,这将迫使您养成按其他汇编程序期望的方式编写代码的习惯,而不是依靠MASM的按我的意思,而不是按我写的"拐杖来编写代码.

说到按我的意思,而不是我写的东西"这样的拐杖,MASM通常还使您可以省去操作数大小说明符,因为它知道变量的大小.但是再次,我建议编写它以保持清晰和一致.因此,如果myVarint,则可以这样做:

mov  eax, DWORD PTR [myVar]    ; eax = myVar

mov  DWORD PTR [myVar], eax    ; myVar = eax

此表示法是 必要在其他诸如NASM 的汇编程序中,它们不是强类型的,并且不记得myVarDWORD大小的内存位置.

在取消引用寄存器操作数时,根本不需要此操作,因为寄存器的名称指示其大小. alah总是BYTE大小,ax总是WORD大小,eax总是DWORD大小,并且rax总是QWORD大小.但是,如果愿意,无论如何也可以包含它,以与标记内存操作数的方式保持一致.


还尝试执行mov eax, [ebx]时出现编译错误,为什么会这样?

嗯……你不应该.在MSVC的内联汇编中,这对我来说很好.正如我们已经看到的,它等效于:

mov  eax, DWORD PTR [ebx]

,这意味着将取消引用ebx指向的内存位置,并且将DWORD大小的值加载到eax中.


为什么我不能做mov a, [eax]那不应该使"a"指向eax所指向的位置的指针吗?

不.不允许这种操作数的组合.从 MOV指令的文档中可以看到,基本上有五种可能性(忽略其他编码和句段):

mov  register, register     ; copy one register to another
mov  register, memory       ; load value from memory into register
mov  memory,   register     ; store value from register into memory
mov  register, immediate    ; move immediate value (constant) into register
mov  memory,   immediate    ; store immediate value (constant) in memory

请注意,您尝试的是mov memory, memory.

但是,您可以通过简单的编码使a指向eax指向的内容:

mov  DWORD PTR [a], eax

现在aeax具有相同的值.如果eax是指针,则a现在是指向相同内存位置的指针.

如果要将a设置为eax指向的,则需要执行以下操作:

mov  eax, DWORD PTR [eax]    ; eax = *eax
mov  DWORD PTR [a], eax      ; a   = eax

当然,这会阻塞指针并将其替换为取消引用的值.如果不想丢失指针,则必须使用第二个临时"寄存器.像这样:

mov  edx, DWORD PTR [eax]    ; edx = *eax
mov  DWORD PTR [a], edx      ; a   = edx


我意识到这一切都令人困惑.在x86 ISA中,mov指令被大量潜在的含义所重载.这是由于x86起源于CISC体系结构.相比之下,现代RISC体系结构在分离寄存器-寄存器移动,存储器负载和存储器存储方面做得更好. x86将它们全部塞进一条mov指令中.现在回去修复它为时已晚.您只需要对语法感到满意,有时便需要重新看一眼.

hope this isnt a dumb question, but I am trying to wrap my mind around pointers in assembly.

What exactly is the difference between:

mov eax, ebx

and

mov [eax], ebx

and when should dword ptr [eax] should be used?

Also when I try to do mov eax, [ebx] I get a compile error, why is this?

解决方案

As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.

So, this:

mov  eax, ebx

simply copies the value in ebx into eax. In a pseudo-C notation, this would be: eax = ebx.

Whereas this:

mov  eax, [ebx]

dereferences the contents of ebx and stores the pointed-to value in eax. In a pseudo-C notation, this would be: eax = *ebx.

Finally, this:

mov  [eax], ebx

stores the value in ebx into the memory location pointed to by eax. Again, in pseudo-C notation: *eax = ebx.


The registers here could also be replaced with memory operands, such as symbolic variable names. So this:

mov  eax, [myVar]

dereferences the address of the variable myVar and stores the contents of that variable in eax, like eax = myVar.

By contrast, this:

mov  eax, myVar

stores the address of the variable myVar into eax, like eax = &myVar.

At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.

To get the address of a variable in MASM, you would use the OFFSET keyword:

mov  eax, OFFSET myVar

However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.

Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar is an int, you would do:

mov  eax, DWORD PTR [myVar]    ; eax = myVar

or

mov  DWORD PTR [myVar], eax    ; myVar = eax

This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar is a DWORD-sized memory location.

You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al and ah are always BYTE-sized, ax is always WORD-sized, eax is always DWORD-sized, and rax is always QWORD-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.


Also when I try to do mov eax, [ebx] I get a compile error, why is this?

Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:

mov  eax, DWORD PTR [ebx]

and means that the memory location pointed to by ebx will be dereferenced and that DWORD-sized value will be loaded into eax.


why I cant do mov a, [eax] Should that not make "a" a pointer to wherever eax is pointing?

No. This combination of operands is not allowed. As you can see from the documentation for the MOV instruction, there are essentially five possibilities (ignoring alternate encodings and segments):

mov  register, register     ; copy one register to another
mov  register, memory       ; load value from memory into register
mov  memory,   register     ; store value from register into memory
mov  register, immediate    ; move immediate value (constant) into register
mov  memory,   immediate    ; store immediate value (constant) in memory

Notice that there is no mov memory, memory, which is what you were trying.

However, you can make a point to what eax is pointing to by simply coding:

mov  DWORD PTR [a], eax

Now a and eax have the same value. If eax was a pointer, then a is now a pointer to that same memory location.

If you want to set a to the value that eax is pointing to, then you will need to do:

mov  eax, DWORD PTR [eax]    ; eax = *eax
mov  DWORD PTR [a], eax      ; a   = eax

Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:

mov  edx, DWORD PTR [eax]    ; edx = *eax
mov  DWORD PTR [a], edx      ; a   = edx


I realize this is all somewhat confusing. The mov instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.

这篇关于x86汇编指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆