关于Thumb-2的ARM/Thumb互通混淆 [英] ARM/Thumb interworking confusion regarding Thumb-2

查看:106
本文介绍了关于Thumb-2的ARM/Thumb互通混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一段时间以来,我一直在阅读与ARM ISA相关的文档,到目前为止,我相信我已经对ARM/Thumb互通的基础知识有了很好的了解.我将在下面快速总结一下:

  • 指令可以是4字节对齐(ARM)或2字节对齐(Thumb).
  • Thumb和ARM指令位于单独的区域,即,在没有显式处理器状态更改的情况下,它们不会混合在一起.
  • 在执行 bx blx ldm ldr 之一时,状态更改可能发生.在ARM还是Thumb之间进行选择取决于地址中最低有效位的值,该值可以分别为0或1.
  • 处理器的当前状态可以是ARM或Thumb.这取决于 CPSR 的第5位的状态.

状态更改的规则可以总结为下图,摘自该,但是,我专注于互通.

解决方案

选择模式

到目前为止,我相信我对ARM/Thumb互通的基础知识已经有了很好的了解.

嗯,这很有用,它确实是旧故事的一部分.最初,只有ARM 32位指令(1980-1990年代中期).然后,ARM提出了一种类似于压缩前端的模式,该模式将严格的16位操作码扩展为32位.这是拇指模式(1990年代中期至〜2005年).然后,ARM推出了 thumb2 (有点模糊),主要表现为16位和32位指令的混合使用(约2005年至今).

互通的概念仅对具有 thumb (旧)和ARM功能的CPU有用.如果您有 thumb2 CPU和具有正常内存( 1 + 等待状态)的良好编译器,则 thumb2 几乎总是最佳选择

缩略图2混合

在具有 thumb2 功能的处理器中,您不需要交互工作!即,您不更改模式.您可以使用 thumb 16位编码,如果在无法实现的情况下要求使用助记符,则汇编器会发出32位版本. Cortex-M CPU仅具有 thumb2 mode (实际上是带有指令扩展名的 thumb 模式).

拆卸

实际上并没有三种类型的操作码,而是两种具有一种扩展名的操作码.

  • 原始的32位ARM操作码.
  • 仅16位缩略图编码.
  • thumb2 扩展,其中包含所有 thumb 操作码以及更多内容.

由于 thumb 操作码更加密集,因此不可能进行所有类型的操作.因此,与ARM相比,拇指 ADC 受限制.但是,对于大多数指令,ARM Holding更新了 thumb2 (CPU中唯一的模式是Thumb; thumb2 是额外的指令/操作码)以具有ARM模式的所有功能. ADC .

关于在其他地方识别二进制模式的讨论.假设代码不是在试图混淆并且人们做出了合理的选择,那么您将只有两种类型的反汇编.

  1. ARM 32位
  2. thumb2

thumb2 的反汇编程序应使用纯 thumb 代码.大多数人不使用 interworking .如果这样做的话,二进制文件的很大一部分将是拇指模式,而在ARM模式下只有很小的性能关键部分.

thumb2 的一个难点是,混合16/32位在解码32位编码中间流时,会导致反汇编程序误解指令流.

最后一点,这是最接近的问题,但是,我专注于互通.

互通 thumb2 CPU上没有任何意义.由于您的问题被标记为<拆卸> ,因此我尝试着重点回答其他问题,这些问题主要是关于模式是什么.对于 elf 反汇编,反汇编程序应该可以轻松找到主要功能的入口点,并且应该能够进行反汇编,而不会出现很多问题.

I've been going through ARM ISA related documentation since a while and so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking. I'll quickly summarize that in the following:

  • Instructions can be either 4 byte aligned (ARM) or 2 byte aligned (Thumb).
  • Thumb and ARM instructions reside in separate regions i.e. they are not intermixed without explicit processor state change.
  • State change can happen upon executing either of bx, blx, ldm, ldr. Choosing between ARM or Thumb depends on the value of the least significant bit in the address which can be 0 or 1 respectively.
  • The current state of the processor can be either ARM or thumb. That depends on the state of bit 5 of CPSR.

Rules for state change can be summarized in the following figure taken from this paper:

However, Thumb-2 instructions have confused me a bit. For instance, let's inspect the encoding of instruction ADC which can be found in section A8.8.2 of the ARMv7-A/R reference manual. Basically, the same instruction has 3 distinct encodings 16 bit (Thumb), 32 bit (Thumb2), and 32 bit (ARM).

Here are my questions:

  • Does the 32-bit Thumb-2 instructions execute in ARM or Thumb mode of the processor? (I'm assuming its the latter but not sure)

  • Some resources mention that ARM/Thumb instructions can be "freely" intermixed in thumb-2. Does that mean explicit state change using bx, blx, ldm or ldr doesn't need to happen?

Final note, this is the closest question to mine, however, I'm focusing on interworking.

解决方案

Choicing a mode

so far I believe that I've got a good understanding for the basics of ARM/Thumb interworking.

Well, that is useful, it is really part of an older story. Originally, there was only ARM 32-bit instructions (1980-mid 1990s). Then ARM made a mode that was like a compression front-end that expanded a strictly 16bit opcodes to 32 bits. This was thumb mode (mid 1990s to ~2005). Then ARM came out with thumb2 (which is somewhat nebulous) mainly typified by a mix of both 16bit and 32bit instructions (~2005 to current).

The concept of interworking is only useful for a CPU with thumb (old) and ARM functions. If you have a thumb2 CPU and a good compiler with normal memory (1+ wait states), then the thumb2 is almost always the best choice.

Thumb2 intermixing

In a thumb2 capable processor, you do not need interworking! Ie, you don't change modes. You can use the thumb 16bit encodings and if you ask for a mnemonic where this is not possible, the assembler emits a 32bit version. The Cortex-M CPUs only have a thumb2 mode (really thumb mode with instruction extensions).

Disassembling

There are not really three types of opcodes but two with one extension.

  • Original 32 bit ARM opcodes.
  • 16 bit only thumb encodings.
  • the thumb2 extension with all thumb opcodes plus more.

As the thumb opcodes are more dense, it is not possible to do all types of operations. So the thumb ADC is limited compared to the ARM. However, for most instructions ARM Holding updated the thumb2 (the only mode in the CPU is thumb; thumb2 is extra instructions/opcodes) to have all the capabilities of the ARM mode ADC.

There are discussions on recognizing the mode in a binary elsewhere. Assuming the code is not trying to obfuscate and people made rational choices, you will only have a two types of disassembly.

  1. ARM 32 bit
  2. thumb2

A thumb2 disassembler should work with pure thumb code. Most people do not use interworking. If they do, a large part of the binary will be thumb mode, with a small performance critical section in ARM mode.

A difficulty with thumb2 is the mixed 16/32 bit can lead a disassembler to mis-interpret an instruction stream if it decodes a 32bit encoding mid stream.

Final note, this is the closest question to mine, however, I'm focusing on interworking.

Interworking makes no sense on a thumb2 CPU. Since you question is tagged disassembling, I tried to answer with that focus versus the other questions that is mainly about what the modes are. For elf disassembly, the disassembler should have no trouble to locate major function entry points and should be able to disassemble without much issues.

这篇关于关于Thumb-2的ARM/Thumb互通混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆