当使用 MOV 助记符将字符串加载/复制到 MASM 中的内存寄存器时,字符是否以相反的顺序存储? [英] When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order?

查看:24
本文介绍了当使用 MOV 助记符将字符串加载/复制到 MASM 中的内存寄存器时,字符是否以相反的顺序存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道使用MOV指令将字符串复制到寄存器中是否会导致字符串以相反的顺序存储.我了解到,当 MASM 将字符串存储到定义为单词或更高(dw 和更大尺寸)的变量中时,字符串以相反的顺序存储.当我将字符串复制到寄存器时会发生同样的事情吗?

I want to know if using the MOV instruction to copy a string into a register causes the string to be stored in reverse order. I learned that when MASM stores a string into a variable defined as a word or higher (dw and larger sizes) the string is stored in reverse order. Does the same thing happen when I copy a string to a register?

基于这个问题(关于 SCAS 指令 和关于在 MASM 32 中为变量分配字符串和字符) 我假设如下:

Based on this questions (about the SCAS instruction and about assigning strings and characters to variables in MASM 32) I assumed the following:

  1. 当 MASM 将字符串加载到变量中时,它以相反的顺序加载它,即字符串中的最后一个字符存储在字符串变量的最低内存地址(开头)中.这意味着像这样分配一个变量 str:str dd "abc" 使 MASM 将字符串存储为 "cba",这意味着 "c" 位于最低内存地址中.
  2. 当将变量定义为 str db "abc" 时,MASM 将 str 视为字符数组.尝试将数组索引与 str 的内存地址匹配,MASM 将在 str 的最低内存地址处存储a".
  3. 默认情况下,SCAS 和 MOVS 指令从目标字符串的开始(最低)地址执行,即存储在 EDI 寄存器中的字符串.在执行之前,它们不会弹出"或将后进先出"规则应用于它们操作的内存地址.
  4. MASM 始终以相同的方式将字符数组和字符串处理为内存寄存器.将字符数组 'a'、'b'、'c' 移动到 EAX 与将abc"移动到 EAX 相同.
  1. When MASM loads a string into a variable, it loads it in reverse order, i.e. the last character in the string is stored in the lowest memory address (beginning) of the string variable. This means assigning a variable str like so: str dd "abc" causes MASM to store the strings as "cba", meaning "c" is in the lowest memory address.
  2. When defining a variable as str db "abc" MASM treats str as an array of characters. Trying to match the array index with the memory address of str, MASM will store "a" at the lowest memory address of str.
  3. By default, the SCAS and MOVS instructions execute from the beginning (lowest) address of the destination string, i.e. the string stored in the EDI register. They do not "pop" or apply the "last in, first out" rule to the memory addresses they operate on before executing.
  4. MASM always treats character arrays and strings to memory registers the same way. Moving the character array 'a', 'b', 'c' to EAX is the same as moving "abc" to EAX.

当我使用 将包含字符 'a'、'b' 和 'c' 的字节数组 arLetters 传输到双字变量 strLetters 时MOVSD,我相信这些字母会反向复制到 strLetters,即存储为cba".当我使用 mov eax, "abc" 字母是否也以相反的顺序存储?

When I transfer a byte array arLetters with the characters 'a', 'b', and 'c' to the double-word variable strLetters using MOVSD, I believe the letters are copied to strLetters in reverse, i.e. stored as "cba". When I use mov eax, "abc" are the letters also stored in reverse order?

下面的代码将在退出之前设置零标志.

The code below will set the zero flag before it exits.

.data?
strLetters dd ?,0

.data
arLetters db "abcd"

.code

start:
mov ecx, 4
lea esi, arLetters
lea edi, strLetters
movsd
;This stores the string "dcba" into strLetters.

mov ecx, 4
lea edi, strLetters
mov eax, "dcba" 
repnz scasd
jz close
jmp printer
;strLetters is not popped as "abcd" and is compared as "dcba".

printer:
print "No match.",13,10,0
jmp close

close:
push 0
call ExitProcess

end start

我希望字符串dcba"按原样"存储在 EAX 中 - 'd' 位于 EAX 的最低内存地址 - 因为 MASM 将字符串移动到寄存器的方式与将字符串分配给变量不同.MASM 将 'a', 'b', 'c' 'd'" 作为 "dcba" 复制到 strLetters 中,以确保在弹出 strLetters 时,以正确的顺序 ("abcd") 发出/释放字符串.如果 REP MOVSB 指令用于代替 MOVSD,strLetters 将包含abcd"并作为dcba"弹出/发送.然而,因为 MOVSD 被使用并且 SCAS 或 MOVS 指令在执行前不弹出字符串,上面的代码应该设置零标志,对吗?

I expect the string "dcba" to be stored in EAX "as is" - with 'd' in the lowest memory address of EAX - since MASM treats moving strings to registers different from assigning strings to variables. MASM copied 'a', 'b', 'c' 'd'" into strLetters as "dcba" to ensure that if strLetters was popped, the string is emmitted/released in the correct order ("abcd"). If the REP MOVSB instruction were used in place of MOVSD, strLetters would have contained "abcd" and would be popped/emmitted as "dcba". However, becasuse MOVSD was used and SCAS or MOVS instructions do not pop strings before executing, the code above should set the zero flag, right?

推荐答案

不要在 MASM 需要 16 位或更大整数的上下文中使用字符串.MASM 会将它们转换为整数,这种方式在存储在内存中时会颠倒字符的顺序.由于这令人困惑,因此最好避免这种情况,并且仅将字符串与 DB 指令一起使用,这会按预期工作.不要使用多于字符的字符串作为直接值.

Don't use strings in contexts where MASM expects a 16-bit or larger integer. MASM will convert them to integers in a way that reverses the order of characters when stored in memory. Since this is confusing it's best to avoid this, and only use strings with the DB directive, which works as expected. Don't use strings with more than character as immediate values.

寄存器没有地址,讨论寄存器内的字节顺序是没有意义的.在 32 位 x86 CPU 上,通用寄存器(如 EAX)保存 32 位整数值.您可以在概念上将 32 位值划分为 4 个字节,但是当它存在于寄存器中时,字节没有有意义的顺序.

Registers don't have addresses, and it's meaningless to talk about the order of bytes within a register. On a 32-bit x86 CPU, the general purpose registers like EAX hold 32-bit integer values. You can divide a 32-bit value conceptually into 4 bytes, but while it lives in a register there is no meaningful order to the bytes.

只有当内存中存在 32 位值时,组成它们的 4 个字节才会有地址,因此才有顺序.由于 x86 CPU 使用 little-endian 字节顺序,这意味着 4bytes 是第一个字节.最重要的部分成为最后一个字节.每当 x86 向内存加载或存储 16 位或更宽的值时,它使用小端字节序.(一个例外是 MOVBE 指令,它在加载和存储值时专门使用大端字节顺序.)

It's only when 32-bit values exist in memory do the 4 bytes that make them up have addresses and so have an order. Since x86 CPUs use the little-endian byte order that means the least-significant byte of the 4 bytes is the first byte. The most-significant part becomes the last byte. Whenever the x86 loads or stores a 16-bit or wider value to or from memory it uses the little-endian byte order. (An exception is the MOVBE instruction which specifically uses the big-endian byte order when loading and storing values.)

    .MODEL flat

    .DATA
db_str  DB  "abcd"
dd_str  DD  "abcd"
num DD  1684234849

    .CODE
_start: 
    mov eax, "abcd"
    mov ebx, DWORD PTR [db_str]
    mov ecx, DWORD PTR [dd_str]
    mov edx, 1684234849
    mov esi, [num]
    int 3

    END _start

组装和链接后,它会被转换成这样的字节序列:

After assembling and linking it gets converted into sequence of bytes something like this:

.text section:
  00401000: B8 64 63 62 61 8B 1D 00 30 40 00 8B 0D 04 30 40  ,dcba...0@....0@
  00401010: 00 BA 61 62 63 64 8B 35 08 30 40 00 CC           .ºabcd.5.0@.I
  ...
.data section:
  00403000: 61 62 63 64 64 63 62 61 61 62 63 64              abcddcbaabcd

(在 Windows 上,.data 部分通常放在内存中的 .text 部分之后.)

(On Windows the .data section normally gets placed after the .text section in memory.)

所以我们可以看到 DB 和 DD 指令,标记为 db_strdd_str 的指令,为同一个字符串 "abcd 生成两个不同的字节序列".在第一种情况下,MASM 生成我们期望的字节序列 61h、62h、63h 和 64h,ab 的 ASCII 值,cd 分别.对于 dd_str,尽管字节序列是相反的.这是因为 DD 指令使用 32 位整数作为操作数,因此必须将字符串转换为 32 位值,而当转换结果存储在内存中时,MASM 最终会颠倒字符串中的字符顺序.

So we can see that the DB and DD directives, the ones labelled db_str and dd_str, generates two different sequences of bytes for the same string "abcd". In the first case, the MASM generates a sequence of bytes that we would we would expect, 61h, 62h, 63h, and 64h, the ASCII values for a, b, c, and d respectively. For dd_str though the sequence of bytes is reversed. This is because the DD directive uses 32-bit integers as operands, so the string has to be converted to a 32-bit value and MASM ends up reversing the order of characters in the string when the result of the conversion gets stored in memory.

您还会注意到标记为 num 的 DD 指令也生成了与 DB 指令相同的字节序列.事实上,如果不查看源代码,就无法判断前四个字节应该是字符串,而后四个字节应该是数字.如果程序以这种方式使用它们,它们只会变成字符串或数字.

You'll also notice the DD directive labelled num also generated the same sequence of bytes that the DB directive. Indeed, without looking at the source there's no way to tell that the first four bytes are supposed to be a string while the last four bytes are supposed to be a number. They only become strings or numbers if the program uses them that way.

(不太明显的是十进制值 1684234849 是如何转换成与 DB 指令生成的序列字节相同的.它已经是一个 32 位的值,只需要通过 MASM 转换成一个字节序列.不出所料,汇编程序使用与 CPU 相同的小端字节序来执行此操作.这意味着第一个字节是 1684234849 的最低有效部分,它恰好与 ASCII 字母 a (1684234849% 256 = 97 = 61h.最后一个字节是数字的最高有效部分,恰好是d的ASCII值(1684234849/256/256/256 = 100 = 64h).)

(Less obvious is how the decimal value 1684234849 was converted into the same sequence bytes as generated by the DB directive. It's already a 32-bit value, it just needs to be converted into a sequence of bytes by MASM. Unsurprisingly, the assembler does so using the same little-endian byte order that the CPU uses. That means the first byte is the least significant part of 1684234849 which happens to have the same value as the ASCII letter a (1684234849 % 256 = 97 = 61h). The last byte is the most significant part of the number, which happens to be the ASCII value of d (1684234849 / 256 / 256 / 256 = 100 = 64h).)

使用反汇编器更仔细地查看 .text 部分中的值,我们可以看到存储在那里的字节序列在被 CPU 执行时如何解释为指令:

Looking the the values in the .text section more closely with a disassembler, we can see how the sequence of bytes stored there will interpreted as instructions when executed by the CPU:

  00401000: B8 64 63 62 61     mov         eax,61626364h
  00401005: 8B 1D 00 30 40 00  mov         ebx,dword ptr ds:[00403000h]
  0040100B: 8B 0D 04 30 40 00  mov         ecx,dword ptr ds:[00403004h]
  00401011: BA 61 62 63 64     mov         edx,64636261h
  00401016: 8B 35 08 30 40 00  mov         esi,dword ptr ds:[00403008h]
  0040101C: CC                 int         3

我们在这里可以看到,MASM 在指令 mov eax, "abcd" 中存储构成立即数的字节,其顺序与 dd_str DD 指令.内存中指令的立即数部分的第一个字节是 64h,d 的 ASCII 值.之所以会这样,是因为有一个32位的目的寄存器,这个MOV指令使用了一个32位的立即数.这意味着 MASM 需要将字符串转换为 32 位整数,并最终像对 dd_str 那样颠倒字节顺序.MASM 还处理作为 mov ecx, 1684234849 的立即数给出的十进制数,就像处理使用相同数字的 DD 指令一样.32 位值已转换为相同的 little-endian 表示.

What we can see here is that that MASM stored the bytes that make up the immediate value in the instruction mov eax, "abcd" in the same order it did with the dd_str DD directive. The first byte of the immediate part of the instruction in memory is 64h, the ASCII value of d. The reason why is because the with a 32-bit destination register this MOV instruction uses a 32-bit immediate. That means that MASM needs to convert the string to a 32-bit integer and ends up reversing the order of bytes as it did with dd_str. MASM also handles the decimal number given as the immediate to the mov ecx, 1684234849 the same way it did with the DD directive that used the same number. The 32-bit value was converted to same little-endian representation.

您还会注意到反汇编器生成的汇编指令使用十六进制值作为这两条指令的立即数.像 CPU 一样,汇编器无法知道立即数应该是字符串和十进制数.它们只是程序中的一个字节序列,它只知道它们是 32 位立即数(来自操作码 B8h 和 B9h),因此将它们显示为 32 位十六进制值,因为没有更好的选择.

You'll also notice that the disassembler generated assembly instructions that use hexadecimal values for the immediates of these two instruction. Like the CPU, the assembler has no way of knowing that immediate values are supposed be strings and decimal numbers. They're just a sequence of bytes in the program, all it knows is that they're 32-bit immediate values (from the opcodes B8h and B9h) and so displays them as 32-bit hexadecimal values for the lack of any better alternative.

通过在调试器下执行程序并在到达断点指令(int 3)后检查寄存器,我们可以看到寄存器中实际结束的内容:

By executing the program under a debugger and inspecting the registers after it reaches the breakpoint instruction (int 3) we can see what actually ended up in the registers:

eax=61626364 ebx=64636261 ecx=61626364 edx=64636261 esi=64636261 edi=00000000
eip=0040101c esp=0018ff8c ebp=0018ff94 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
image00000000_00400000+0x101c:
0040101c cc              int     3

现在我们可以看到第一条和第三条指令加载的值与其他指令不同.这两条指令都涉及 MASM 将字符串转换为 32 位值并最终反转内存中字符顺序的情况.寄存器转储确认内存中内存中字节的颠倒顺序会导致不同的值被加载到寄存器中.

Now we can see that the first and third instructions loaded a different value than the other instructions. These two instruction both involve cases where MASM converted the string to a 32-bit value and ended up reversing order of the characters in memory. The register dump confirms that reversed order of bytes in memory in memory results in different values being loaded into the registers.

现在您可能正在查看上面的寄存器转储,并认为只有 EAX 和 ECX 的顺序正确,a 的 ASCII 值,61h 在前,d,最后 64 小时.MASM 反转内存中字符串的顺序实际上导致它们以正确的顺序加载到寄存器中.但正如我之前所说,寄存器中没有字节顺序.数字 61626364 正是调试器在将值显示为您可以阅读的字符序列时表示该值的方式.字符 61 在调试器的表示中最先出现,因为我们的编号系统将数字的最重要部分放在左侧,我们从左到右读取,使其成为第一部分.然而,正如我之前所说的,x86 CPU 是小端的,这意味着最不重要的部分首先出现在内存中.这意味着内存中的第一个字节成为寄存器中值的最低有效部分,调试器将其显示为数字最右边的两个十六进制数字,因为这是数字在我们的编号系统中的最低有效部分.

Now you might be looking at that register dump above and thinking that only EAX and ECX is in the correct order, with the ASCII value for a, 61h first and and the ASCII value for d, 64h last. That MASM reversing the order of the strings in memory actually caused them to be loaded into registers in the correct order. But as I said before, there's no byte order in registers. The number 61626364 is just how the debugger represents the value when displaying it as a sequence of characters you can read. The characters 61 come first in the debugger's representation because our numbering system puts the most significant part of the number on the left, and we read left-to-right so that makes it the first part. However, as I also said before, x86 CPUs are little-endian, which means the least significant part comes first in memory. That means the first byte in memory becomes the least significant part of the value in the register, which gets displayed as the rightmost two hexadecimal digits of the number by the debugger because that's where least significant part the number goes in our numbering system.

换句话说,因为 x86 CPU 是小端的,最不重要的在前,但我们的编号系统是大端的,最重要的在前,十六进制数字以字节方式显示,与它们实际存储的顺序相反记忆.

In other words because x86 CPUs are little-endian, least significant first, but our numbering system is big-endian, most significant first, hexadecimal numbers get displayed in a byte-wise reverse order to how they're actually stored in memory.

现在也应该很清楚,将字符串加载到寄存器中只是概念上发生的事情.该字符串由汇编程序转换为字节序列,当加载到 32 位寄存器时,在内存中被视为小端 32 位整数.当寄存器中的 32 位值存储在内存中时,32 位值被转换为一个字节序列,以小端格式表示该值.对于 CPU 而言,您的字符串只是一个 32 位整数,它加载并存储到内存中.

It should also be hopefully clear by now that loading a string into a register is only something that happens conceptually. The string gets converted into a sequence of bytes by the assembler, which when loaded into a 32-bit register, gets treated as little-endian 32-bit integer in memory. When the 32-bit value in the register is stored in memory the 32-bit value is converted into a sequence of bytes that represent the value in little-endian format. To the CPU your string is just a 32-bit integer it loaded and stored to and from memory.

所以这意味着如果示例程序中加载到 EAX 中的值以类似 mov [mem], eax 的形式存储到内存中,那么 mem 中存储的 4 个字节code> 将与它们出现在组成 mov eax, "abcd" 的直接字节中的顺序相同.也就是同样的倒序,64h、63h、62h、61h,MASM 把它们放在构成立即数的字节中.

So that means that if the value loaded into EAX in the sample program is stored to memory with something like mov [mem], eax then the the 4 bytes stored at mem will be in the same order as they appeared in the bytes that made up the immediate of mov eax, "abcd". That is in the same reversed order, 64h, 63h, 62h, 61h, that MASM put them in the bytes that make up immediate.

现在至于为什么 MASM 在将字符串转换为 32 位整数时反转字符串的顺序我不知道,但这里的道德是不要将字符串用作立即数或任何其他需要转换为的上下文整数.汇编器在如何将字符串文字转换为整数方面不一致.(在 C 编译器如何将诸如 'abcd' 之类的字符文字转换为整数时,也会出现类似的问题.)

Now as to why MASM is reversing the order of strings when converting them to 32-bit integers I don't know, but the moral here is not to use strings as immediates or any other context where they need to be converted to integers. Assemblers are inconsistent on how they convert string literals into integers. (A similar problem occurs in how C compilers convert character literals like 'abcd' into integers.)

SCSD 或 MOVSD 指令没有什么特别的.SCSD 将 EDI 指向的四个字节视为 32 位小端值,将其加载到未命名的临时寄存器中,将临时寄存器与 EAX 进行比较,然后根据 DF 标志从 EDI 中加或减 4.MOVSD将ESI指向的内存中的32位值加载到一个未命名的临时寄存器中,将EDI指向的32位内存位置存储在临时寄存器中,然后根据DF标志更新ESI和EDI.(字节顺序对于 MOVSD 无关紧要,因为字节从不用作 32 位值,但顺序不会改变.)

Nothing special happens with the SCASD or MOVSD instrucitons. SCASD treats the four bytes pointed to by EDI as a 32-bit little-endian value, loads it into an unnamed temporary register, compares the temporary register to EAX, and then adds or subtracts 4 from EDI depending on the DF flag. MOVSD loads a 32-bit value in memory pointed to by ESI into an unnamed temporary register, stores the temporary register the 32-bit memory location pointed to by EDI, and then updates ESI and EDI according to the DF flag. (Byte order doesn't matter for MOVSD as the bytes are never used as a 32-bit value, but the order isn't changed.)

我不会尝试将 SCSD 或 MOVSD 视为 FIFO 或 LIFO,因为最终这取决于您如何使用它们.MOVSD 可以像 LIFO 堆栈一样轻松地用作 FIFO 队列实现的一部分.(将其与 PUSH 和 POP 进行比较,理论上它们可以独立用作 FIFO 或 LIFO 数据结构实现的一部分,但一起只能用于实现 LIFO 堆栈.)

I wouldn't try to think of SCASD or MOVSD as FIFO or LIFO because ultimately that depends on how you use them. MOVSD can just as easily be used as part of an implementation of FIFO queue as a LIFO stack. (Compare this to PUSH and POP, which in theory could independently be used part of an implementation of either a FIFO or LIFO data structure, but together can only be used to implement a LIFO stack.)

这篇关于当使用 MOV 助记符将字符串加载/复制到 MASM 中的内存寄存器时,字符是否以相反的顺序存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆