关于 Thumb-2 的 ARM/Thumb 互操作混淆 [英] ARM/Thumb interworking confusion regarding Thumb-2

查看:38
本文介绍了关于 Thumb-2 的 ARM/Thumb 互操作混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一段时间以来,我一直在阅读 ARM ISA 相关文档,到目前为止,我相信我对 ARM/Thumb 互操作的基础知识有了很好的理解.我将快速总结如下:

  • 指令可以是 4 字节对齐 (ARM) 或 2 字节对齐 (Thumb).
  • Thumb 和 ARM 指令位于不同的区域,即它们不会在没有显式处理器状态更改的情况下混合.
  • 状态改变可以在执行 bxblxldmldr 中的任何一个时发生.在 ARM 或 Thumb 之间进行选择取决于地址中最低有效位的值,该值可以分别为 0 或 1.
  • 处理器的当前状态可以是 ARM 或拇指.这取决于 CPSR 的第 5 位的状态.

状态变化的规则可以总结为下图,取自这篇,但是,我专注于互通.

解决方案

选择模式

<块引用>

到目前为止,我相信我已经对 ARM/Thumb 互通的基础知识有了很好的了解.

嗯,这很有用,它确实是旧故事的一部分.最初,只有 ARM 32 位指令(1980-1990 年代中期).然后 ARM 制作了一种类似于压缩前端的模式,将严格的 16 位操作码扩展到 32 位.这是拇指模式(1990 年代中期至 2005 年).然后 ARM 推出了 thumb2(有点模糊),主要以 16 位和 32 位指令(~2005 年至今)的混合为代表.

互通的概念仅对具有拇指(旧)和 ARM 功能的 CPU 有用.如果您有 thumb2 CPU 和具有正常内存(1+ 等待状态)的良好编译器,那么 thumb2 几乎总是最佳选择.

Thumb2 混合

在支持 thumb2 的处理器中,您不需要互通!即,你不改变模式.您可以使用 thumb 16 位编码,如果您在不可能的情况下要求助记符,则汇编器会发出 32 位版本.Cortex-M CPU 只有一个 thumb2 mode(真正具有指令扩展的拇指模式).

拆解

实际上并不是三种类型的操作码,而是两种带有一个扩展.

  • 原始 32 位 ARM 操作码.
  • 仅限 16 位 thumb 编码.
  • thumb2 扩展,带有所有 thumb 操作码以及更多.

由于 thumb 操作码更密集,因此不可能执行所有类型的操作.因此,拇指 ADC 与 ARM 相比是有限的.然而,对于大多数指令,ARM Holding 更新了 thumb2(CPU 中唯一的模式是拇指;thumb2 是额外的指令/操作码)以具有 ARM 模式的所有功能ADC.

在别处有关于在二进制文件中识别模式的讨论.假设代码没有试图混淆并且人们做出了理性的选择,那么你将只有两种类型的反汇编.

  1. ARM 32 位
  2. thumb2

thumb2 反汇编器应该使用纯 thumb 代码.大多数人不使用互通.如果他们这样做,二进制文件的很大一部分将是拇指模式,在 ARM 模式下有一小部分性能临界区.

thumb2 的一个困难是混合的 16/32 位会导致反汇编器在解码 32 位编码中间流时错误解释指令流.

<块引用>

最后一点,这是最接近我的问题,但是,我专注于互通.

交互工作thumb2 CPU 上毫无意义.由于您的问题被标记为反汇编,因此我试图以该重点与其他主要关于模式是什么的问题来回答.对于elf反汇编,反汇编器应该可以轻松定位主要函数入口点,并且应该能够在没有太多问题的情况下进行反汇编.

I've been going through ARM ISA related documentation since a while and so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking. I'll quickly summarize that in the following:

  • Instructions can be either 4 byte aligned (ARM) or 2 byte aligned (Thumb).
  • Thumb and ARM instructions reside in separate regions i.e. they are not intermixed without explicit processor state change.
  • State change can happen upon executing either of bx, blx, ldm, ldr. Choosing between ARM or Thumb depends on the value of the least significant bit in the address which can be 0 or 1 respectively.
  • The current state of the processor can be either ARM or thumb. That depends on the state of bit 5 of CPSR.

Rules for state change can be summarized in the following figure taken from this paper:

However, Thumb-2 instructions have confused me a bit. For instance, let's inspect the encoding of instruction ADC which can be found in section A8.8.2 of the ARMv7-A/R reference manual. Basically, the same instruction has 3 distinct encodings 16 bit (Thumb), 32 bit (Thumb2), and 32 bit (ARM).

Here are my questions:

  • Does the 32-bit Thumb-2 instructions execute in ARM or Thumb mode of the processor? (I'm assuming its the latter but not sure)

  • Some resources mention that ARM/Thumb instructions can be "freely" intermixed in thumb-2. Does that mean explicit state change using bx, blx, ldm or ldr doesn't need to happen?

Final note, this is the closest question to mine, however, I'm focusing on interworking.

解决方案

Choicing a mode

so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking.

Well, that is useful, it is really part of an older story. Originally, there was only ARM 32-bit instructions (1980-mid 1990s). Then ARM made a mode that was like a compression front-end that expanded a strictly 16bit opcodes to 32 bits. This was thumb mode (mid 1990s to ~2005). Then ARM came out with thumb2 (which is somewhat nebulous) mainly typified by a mix of both 16bit and 32bit instructions (~2005 to current).

The concept of interworking is only useful for a CPU with thumb (old) and ARM functions. If you have a thumb2 CPU and a good compiler with normal memory (1+ wait states), then the thumb2 is almost always the best choice.

Thumb2 intermixing

In a thumb2 capable processor, you do not need interworking! Ie, you don't change modes. You can use the thumb 16bit encodings and if you ask for a mnemonic where this is not possible, the assembler emits a 32bit version. The Cortex-M CPUs only have a thumb2 mode (really thumb mode with instruction extensions).

Disassembling

There are not really three types of opcodes but two with one extension.

  • Original 32 bit ARM opcodes.
  • 16 bit only thumb encodings.
  • the thumb2 extension with all thumb opcodes plus more.

As the thumb opcodes are more dense, it is not possible to do all types of operations. So the thumb ADC is limited compared to the ARM. However, for most instructions ARM Holding updated the thumb2 (the only mode in the CPU is thumb; thumb2 is extra instructions/opcodes) to have all the capabilities of the ARM mode ADC.

There are discussions on recognizing the mode in a binary elsewhere. Assuming the code is not trying to obfuscate and people made rational choices, you will only have a two types of disassembly.

  1. ARM 32 bit
  2. thumb2

A thumb2 disassembler should work with pure thumb code. Most people do not use interworking. If they do, a large part of the binary will be thumb mode, with a small performance critical section in ARM mode.

A difficulty with thumb2 is the mixed 16/32 bit can lead a disassembler to mis-interpret an instruction stream if it decodes a 32bit encoding mid stream.

Final note, this is the closest question to mine, however, I'm focusing on interworking.

Interworking makes no sense on a thumb2 CPU. Since you question is tagged disassembling, I tried to answer with that focus versus the other questions that is mainly about what the modes are. For elf disassembly, the disassembler should have no trouble to locate major function entry points and should be able to disassemble without much issues.

这篇关于关于 Thumb-2 的 ARM/Thumb 互操作混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆