CPU如何正确解码可变长度指令? [英] How does the CPU decode variable length instructions correctly?

查看:611
本文介绍了CPU如何正确解码可变长度指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在大多数体系结构上,指令都是固定长度的.这使得程序的加载和执行变得简单.在x86/x64上,指令的长度是可变的,因此反汇编的程序可能如下所示:

On most architectures, instructions are all fixed-length. This makes program loading and executing straightforward. On x86/x64, instructions are variable length, so a disassembled program might look like this:

File Type: EXECUTABLE IMAGE

  00401000: 8B 04 24           mov         eax,dword ptr [esp]
  00401003: 83 C4 04           add         esp,4
  00401006: FF 64 24 FC        jmp         dword ptr [esp-4]
  0040100A: 55                 push        ebp
  0040100B: E8 F0 FF FF FF     call        00401000
  00401010: 50                 push        eax
  00401011: 68 00 30 40 00     push        403000h
  00401016: E8 0D 00 00 00     call        00401028
  0040101B: 83 C4 08           add         esp,8
  0040101E: 33 C0              xor         eax,eax
  00401020: 5D                 pop         ebp
  00401021: 83 C4 04           add         esp,4
  00401024: FF 64 24 FC        jmp         dword ptr [esp-4]
  00401028: FF 25 00 20 40 00  jmp         dword ptr ds:[00402000h]

  Summary

        1000 .data
        1000 .rdata
        1000 .reloc
        1000 .text

很难想象CPU如何知道"一条指令在哪里结束而下一条指令在哪里开始.例如,如果我将字节0x90(NOP)添加到XOR EAX,EAX操作码的中间,程序将反汇编为:

It seems rather difficult to imagine how the CPU "knows" where one instruction ends and the next one begins. For example, if I add the byte 0x90 (NOP) to the middle of the XOR EAX,EAX opcodes the program then disassembles as:

File Type: EXECUTABLE IMAGE

  00401000: 8B 04 24           mov         eax,dword ptr [esp]
  00401003: 83 C4 04           add         esp,4
  00401006: FF 64 24 FC        jmp         dword ptr [esp-4]
  0040100A: 55                 push        ebp
  0040100B: E8 F0 FF FF FF     call        00401000
  00401010: 50                 push        eax
  00401011: 68 00 30 40 00     push        403000h
  00401016: E8 0D 00 00 00     call        00401028
  0040101B: 83 C4 08           add         esp,8
  0040101E: 33 90 C0 5D 83 C4  xor         edx,dword ptr [eax+C4835DC0h]
  00401024: 04 FF              add         al,0FFh
  00401026: 64 24 FC           and         al,0FCh
  00401029: FF
  0040102A: 25
  0040102B: 00 20              add         byte ptr [eax],ah
  0040102D: 40                 inc         eax

  Summary

    1000 .data
    1000 .rdata
    1000 .reloc
    1000 .text

可以预见的是,它在运行时会崩溃.

Which, predictably, crashes when run.

我很好奇指令解码器看到的那个额外的字节,使得它认为0040101E行是6个字节长,而最初在00401028行是四个单独的指令.

I'm curious exactly what the instruction decoder sees with that extra byte that makes it think the line 0040101E is 6 bytes long, and the line originally at 00401028 is four seperate instructions.

推荐答案

在获取指令时,CPU首先分析其第一个字节(操作码).有时知道指令的总长度就足够了.有时,它告诉CPU分析后续字节以确定长度.但总的来说,编码并不是模棱两可的.

When fetching an instruction, the CPU first analyses its first byte (the opcode). Sometimes it's sufficient to know the total length of the instruction. Sometimes it tells the CPU to analyse subsequent bytes to determine the length. But all in all, the encoding is not ambiguous.

是的,如果在willy-nilly中间插入随机字节,命令流就会搞砸了.这是意料之中的;并非每个字节序列都构成有效的机器代码.

Yes, the command stream gets screwed up if you insert random bytes in the middle willy-nilly. That's to be expected; not every byte sequence constitutes valid machine code.

现在,关于您的特定示例.原始命令为XOR EAX, EAX(33 C0). XOR的编码是那些第二个字节相关的编码之一.第一个字节-33-表示XOR.第二个字节是ModR/M字节.它对操作数进行编码-是否是寄存器对,寄存器和存储位置等.32位模式下的初始值C0对应于操作数EAX,EAX.您插入的值90对应于操作数EDX [EAX + offset],这意味着ModR/M字节后跟32位偏移量.命令流的后四个字节不再被解释为命令-它们是错误的XOR命令中的偏移量.

Now, about your particular example. The original command was XOR EAX, EAX (33 C0). The encoding of XOR is one of those second byte dependent ones. The first byte - 33 - means XOR. The second byte is the ModR/M byte. It encodes the operands - whether it's a register pair, a register and a memory location, etc. The initial value C0 in 32-bit mode corresponds to operands EAX, EAX. The value 90 that you've inserted corresponds to operands EDX, [EAX+offset], and it means that the ModR/M byte is followed by 32 bits of offset. The next four bytes of the command stream are not interpreted as commands anymore - they're the offset in the mangled XOR command.

因此,通过弄乱第二个字节,您已经将2字节的命令变成了6字节的命令.

So by messing with the second byte, you've turned a 2-byte command into a 6-byte one.

然后,CPU(和反汇编程序)将在这四个之后恢复读取.它位于ADD ESP, 4指令的中间,但是CPU无法知道这一点.它以04字节开始,ADD编码中的第三个字节.此时的前几个字节仍然对命令有意义,但是由于您已居中,因此原始指令序列将完全丢失.

Then the CPU (and the disassembler) resumes fetching after those four. It's in the middle of the ADD ESP, 4 instruction, but the CPU has no way of knowing that. It starts with the 04 byte, the third one in the ADD encoding. The first few bytes at that point still make sense as commands, but since you've ended up in the middle, the original instruction sequence is utterly lost.

这篇关于CPU如何正确解码可变长度指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆