32位程序的汇编指令长度 [英] assembly instruction length of 32-bit program

查看：105 发布时间：2021/4/30 20:25:47 assembly x86 disassembly machine-code

本文介绍了32位程序的汇编指令长度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我构建了一个简单的程序，并使用file命令检查程序是否为32位格式.反过来，我使用objdump来反汇编程序，发现某些汇编指令长度大于4字节.

我希望程序是32位格式.因此，这些汇编指令的长度不应大于4字节.显然，我错了.您能告诉我为什么有6字节或7字节的汇编指令吗?谢谢.

  $文件a.outa.out:ELF 32位LSB可执行文件，Intel 80386版本1(SYSV)，动态链接，解释器/lib/ld-linux.so.2，用于GNU/Linux 2.6.32，BuildID [sha1] = 09aa196a671a6e169f09984360133ad9488f7e53，没有剥离

  $ objdump -d a.outa.out:文件格式elf32-i386.init部分的反汇编:080482a8< _init> ;:80482a8:53推％ebx80482a9:83 EC 08子$ 0x8，％esp80482ac:e8 8f 00 00 00致电8048340< __ x86.get_pc_thunk.bx>80482b1:81 c3 4f 1d 00 00加$ 0x1d4f，％ebx80482b7:8b 83 fc ff ff ff mov -0x4(％ebx)，％eax80482bd:85 c0测试％eax，％eax80482bf:74 05 je 80482c6< _init + 0x1e>80482c1:e8 3a 00 00 00呼叫8048300< __ libc_start_main @ plt + 0x10>80482c6:83 c4 08加$ 0x8，％esp80482c9:5b流行％ebx80482ca:c3 ret

解决方案

为什么?一个明显的原因是，一条指令可以 include 一个32位立即数，例如 mov $ address，％register .因此， call rel32 可以从当前地址到达任何32位地址.

这些指令需要一个空间来存放操作码(1个字节)，有时还需要一个ModR/M字节来指定哪些寄存器/存储器是操作数.

如果一条指令限制为4个字节，则需要多条指令才能将静态地址放入寄存器，并且您不能将其中一个用作存储器直接寻址模式.RISC ISA通常需要2条指令来构造寄存器中的任意32位常量(包括地址)，例如MIPS lui $ t0，high_half / ori $ t0，$ t0，low_half

x86是可变长度的CISC；常见的指令很短，但是可以使用更长的指令，而不是强迫您使用单独的指令在寄存器中构造地址或常量.

例如您可以执行 movl $ 123456，some_static_variable 并获得包含以下组件的指令编码:

  mov_opcode(1B)Mod/RM(1B)disp32绝对地址(4B)imm32 = 123456(4B)

总共10个字节，包括两个4字节的值.(在Intel的指令集参考手册(x86 SDM的第2卷)中，这是 mov r/m32，imm32 格式的MOV ，并具有 [disp32] 寻址模式.)

您可以使用前缀使它更长，例如用于线程本地存储的 fs:段覆盖前缀.和/或寻址模式可能包含一个缩放索引寄存器，例如 movl $ 123456，array(，％ecx，4)，因此在ModRM编码编址模式.

我们可以使用 add 代替 mov ，然后我们还可以使用 lock 前缀使它成为原子读取-修改写入.

指令长度的硬限制是15个字节.如果到那时解码仍未找到指令的末尾，则会引发 #UD 非法指令异常.(Linux内核会将SIGILL传递给令人反感的过程.)

(有趣的事实:原始的8086没有限制，并且会很乐意不断循环尝试解码充满 rep 前缀的整个64k段)

i build a simple program and use file command to check program is 32-bit format. in turn, i use objdump to disassemble program and found some assembly instruction length larger than 4-byte.

i expect the program is 32-bit format. therefore, those assembly instruction length should not bigger than 4-byte. obviously, i am wrong. could you please tell me why it has 6-byte or 7-byte assembly instruction? thanks.

$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=09aa196a671a6e169f09984360133ad9488f7e53, not stripped

$ objdump -d a.out
a.out:     file format elf32-i386

Disassembly of section .init:

 080482a8 <_init>:
 80482a8:       53                      push   %ebx
 80482a9:       83 ec 08                sub    $0x8,%esp
 80482ac:       e8 8f 00 00 00          call   8048340 <__x86.get_pc_thunk.bx>
 80482b1:       81 c3 4f 1d 00 00       add    $0x1d4f,%ebx
 80482b7:       8b 83 fc ff ff ff       mov    -0x4(%ebx),%eax
 80482bd:       85 c0                   test   %eax,%eax
 80482bf:       74 05                   je     80482c6 <_init+0x1e>
 80482c1:       e8 3a 00 00 00          call   8048300 <__libc_start_main@plt+0x10>
 80482c6:       83 c4 08                add    $0x8,%esp
 80482c9:       5b                      pop    %ebx
 80482ca:       c3                      ret

解决方案

Why? One obvious reason is so a single instruction can include a 32-bit immediate, like mov $address, %register. And so a call rel32 can reach any 32-bit address from the current address.

These instructions need room for an opcode (1 byte) and sometimes a ModR/M byte to specify which register(s) / memory are operands.

If an instruction was limited to 4 bytes, it would take multiple instructions to put a static address into a register, and you couldn't use one as a memory-direct addressing mode. RISC ISAs typically need 2 instructions to construct arbitrary 32-bit constants (including addresses) in register, like MIPS lui $t0, high_half / ori $t0, $t0, low_half

x86 is variable-length CISC; common instructions are short, but longer instructions are possible instead of forcing you to construct an address or constant in a register with a separate instruction.

e.g. you can do movl $123456, some_static_variable and get an instruction encoding with these components:

mov_opcode (1B)   Mod/RM (1B)    disp32 absolute address (4B)    imm32=123456 (4B)

for a total of 10 bytes, including two 4-byte values. (In Intel's instruction-set reference manual (vol.2 of the x86 SDM), this is the mov r/m32, imm32 form of MOV, with a [disp32] addressing mode.)

You could make it longer with prefixes, for example an fs: segment override prefix for thread-local storage. And/or the addressing mode could include a scaled-index register, like movl $123456, array(,%ecx,4), so a SIB (scale/index/base) byte would be needed after the ModRM to encode the addressing mode.

Instead of mov, we could have used add, and then we could also have used a lock prefix to make it an atomic read-modify write.

The hard limit on instruction length is 15 bytes. If decoding doesn't find the end of an instruction by then, a #UD illegal instruction exception is raised. (A Linux kernel will deliver a SIGILL to the offending process.)

(Fun fact: original 8086 had no limit, and would happily keep looping trying to decode a whole 64k segment full of rep prefixes)

这篇关于32位程序的汇编指令长度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

32位程序的汇编指令长度 [英] assembly instruction length of 32-bit program

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

32位程序的汇编指令长度 [英] assembly instruction length of 32-bit program

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭