如何生成Thumb指令的机器码? [英] How to generate the machine code of Thumb instructions?

查看:45
本文介绍了如何生成Thumb指令的机器码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Google 上搜索了生成 ARM 指令的机器代码,例如这个

对于汇编代码:

B 0x50

如何将立即数 0x50 编码为 4 字节机器码?或者,如果我想编写一个将 B 指令和 the 作为输入的 C 函数,并返回编码的机器代码.如何实现这样的功能?

unsigned int gen_mach_code(int 指令,int relative_addr){/* int指令参数假定为B *//* 编码方式假定为T4 */无符号整数 mach_code;/* 构造B<c>.W<label>的机器码;*/返回 mach_code;}

我知道 ARM 上的立即数编码.这里 http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ 是一个很好的教程.

我只想知道 imm10 和 imm11 来自哪里,以及如何用它们构建完整的机器代码.

解决方案

首先ARM7TDMI不支持thumb2扩展,而是基本定义了原始thumb指令集.

那为什么不试试呢?

.thumb@.syntax 统一0x50

运行这些命令

arm-whatever-whatever-as b.s -o b.oarm-whatever-whatever-objdump -D b.o

得到这个输出

0: e7fe b.n 50 <*ABS*0x50>

所以这是一个 T2 编码,正如新的文档所显示的 ARMv4T、ARMv5T*、ARMv6*、ARMv7 支持的这条指令,ARM7TDMI 是一个 ARMv4t

所以我们看到 E7 匹配该指令定义的 11100 开头所以 imm11 是 0x7FE.这基本上是对地址 0x000 的分支的编码,因为这与任何内容无关.我怎么知道?

.thumbb 跳过没有没有没有没有没有跳过:00000000 <skip-0xc>:0: e004 b.n c <skip>2: 46c0 nop ;(移动 r8, r8)4: 46c0 nop ;(移动 r8, r8)6: 46c0 nop ;(移动 r8, r8)8: 46c0 nop ;(移动 r8, r8)a: 46c0 nop ;(移动 r8, r8)

0xe004 以 11100 开头,所以这是一个分支编码 T2.imm11 是一个 4

我们需要从 0 到 0xC.应用偏移量时,pc 提前了两条指令.文档说

编码 T2 范围内的偶数 –2048 到 2046

PC,程序计数器- 当执行 ARM 指令时,PC 读取当前指令的地址加 8. • 当执行一个- Thumb 指令,PC 读取为当前指令的地址加4.

所以一切都有意义.0xC-0x4 = 8.我们只能做偶数,无论如何分支到指令的中间是没有意义的所以除以2,因为拇指指令是两个字节(偏移量是指令而不是字节).所以给出 4

0xE004

这是一种生成 t4 编码的方法

.thumb.syntax 统一b 跳过没有没有没有没有没有跳过:00000000 <skip-0xe>:0: f000 b805 b.w e <skip>4: 46c0 nop ;(移动 r8, r8)6: 46c0 nop ;(移动 r8, r8)8: 46c0 nop ;(移动 r8, r8)a: 46c0 nop ;(移动 r8, r8)c: 46c0 nop ;(移动 r8, r8)

分支的 T4 编码是第一个半字顶部的 11110 表示这是一条未定义的指令(任何不是 ARMv6T2、ARMv7 的指令)或 ARMv6T2、ARMv7 的拇指 2 扩展

第二个半字 10x1,我们看到一个 B,看起来不错,这是一个拇指 2 扩展分支.

S 是 0 imm10 是 0 j1 是 1 j2 是 1 而 imm11 是 5

I1 = NOT(J1 EOR S);I2 = NOT(J2 EOR S);imm32 = SignExtend(S:I1:I2:imm10:imm11:'0', 32);

1 EOR 0 是 1 对吗?不是你得到 0.所以 I1 和 I2 都是零s 为零 imm10 为零.所以我们基本上只将 imm11 视为正数

执行时 pc 领先 4,所以 0xE - 0x4 = 0xA.

0xA/2 = 0x5 这就是我们的分支偏移偏移 pc + (5*2)

.syntax 统一.拇指b.w 跳过没有这里:没有没有没有没有跳过:b.w 这里00000000<这里-0x6>:0: f000 b805 b.w e <skip>4: 46c0 nop ;(移动 r8, r8)00000006 <这里>:6: 46c0 nop ;(移动 r8, r8)8: 46c0 nop ;(移动 r8, r8)a: 46c0 nop ;(移动 r8, r8)c: 46c0 nop ;(移动 r8, r8)0000000e <跳过>:e: f7ff bffa b.w 6 <here>

s 是 1,imm10 是 0x3FF j1 是 1 j2 是 1 imm1 是 0x7FA

1 eor 1 is 0 不是你得到 1 i1 和相同 i2

imm32 = SignExtend(S:I1:I2:imm10:imm11:'0', 32);

s 是 1,所以这将符号扩展 1,但最后几位是 1,所以 imm32 是 0xFFFFFFFA 或 -6 指令返回或 -12 字节返回

所以我们的偏移量也是 ((0xE + 4) - 6)/2 = 6.或者换个角度看从指令编码PC - (6*2) = (0xE + 4) - 12 = 6 分支到0x6.

所以如果你想分支到 0x70 并且指令的地址是 0x12 那么你的偏移量是 0x70-(0x12+4) = 0x62 或 0x31 指令,我们从跳过中知道技巧是使 s 0 和j1 和 j2 a 1

0x12: 0xF000 0xB831 分支到 0x70

所以现在知道我们可以回到这个:

0: e7fe b.n 50 <*ABS*0x50>

偏移量是一个符号扩展的 0x7FE 或 0xFFFFFFFE.0xFFFFFFFE*2 + 4 =0xFFFFFFFC + 4 = 0x00000000.分支到 0

添加一个 nop

.thumb没有0x5000000000 <.text>:0: 46c0 nop ;(移动 r8, r8)2: e7fe b.n 50 <*ABS*0x50>

相同的编码

所以反汇编意味着 0x50 的绝对值,但没有对其进行编码,链接无济于事,只是抱怨

(.text+0x0):重定位被截断以适应:R_ARM_THM_JUMP11 反对`*ABS*0x50'

这个

.thumb没有0x51

给出相同的编码.

所以基本上这个语法有问题和/或它正在寻找一个名为 0x50 的标签?

我希望你的例子是你想知道某个地址的分支的编码,而不是那种确切的语法.

arm 不像其他一些指令集,分支总是相对的.因此,如果您可以根据编码到达目的地,那么您将获得一个分支,否则,您必须使用 bx 或 pop 或其他方法之一来修改 pc(具有绝对值).

知道docs的T2编码只能提前2048,然后在分支和目的地之间放置2048多个nops

b.s: 汇编消息:b.s:5:错误:分支超出范围

也许这就是您想要做的?

.thumb移动 r0,#0x51比 r000000000 <.text>:0: 2051 movs r0, #81 ;0x512: 4700 bx r0

分支到绝对地址 0x50.对于该特定地址,无需进行 thumb2 扩展.

.thumbldr r0,=0x12345679比 r000000000 <.text>:0: 4800 ldr r0, [pc, #0] ;(4 <.text+0x4>)2: 4700 bx r04: 12345679 eorsne r5, r4, #126877696 ;0x7900000

分支到地址 0x12345678 或任何其他可能的地址.

I searched Google for generating machine code of ARM instructions, such as this one Converting very simple ARM instructions to binary/hex

The answer referenced ARM7TDMI-S Data Sheet (ARM DDI 0084D). The diagram of data processing instructions is good enough. Unfortunately, it's for ARM instructions, not for Thumb/Thumb-2 instructions.

Take the B instruction as an example. ARM Architecture Reference Manual - ARMv7-A and ARMv7-R edition section A8.8.18, Encoding T4:

For the assembly code:

B 0x50

How can I encode the immediate value 0x50 into the 4-byte machine code? Or if I want to write a C function that takes the B instruction and the as inputs, and return the encoded machine code. How can I implement such a function?

unsigned int gen_mach_code(int instruction, int relative_addr)
{
    /* the int instruction parameter is assumed to be B */
    /* encoding method is assumed to be T4 */
    unsigned int mach_code;
    /* construc the machine code of B<c>.W <label> */
    return mach_code;
}

I know the immediate values encoding on ARM. Here http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ is a good tutorial.

I just want to know where is the imm10 and imm11 from, and how to construct the full machine code with them.

解决方案

First and foremost the ARM7TDMI does not support the thumb2 extentions, instead it basically defines the original thumb instruction set.

so why not just try it?

.thumb
@.syntax unified

b 0x50

run these commands

arm-whatever-whatever-as b.s -o b.o
arm-whatever-whatever-objdump -D b.o

get this output

0:  e7fe        b.n 50 <*ABS*0x50>

so that is a T2 encoding and as the newer docs show for this instruction that is supported by ARMv4T, ARMv5T*, ARMv6*, ARMv7 the ARM7TDMI is an ARMv4t

so we see that E7 matches the 11100 start of that instruction definition so the imm11 is 0x7FE. which is basically an encoding of branch to the address 0x000 since this isnt linked with anything. how do I know that?

.thumb
b skip
nop
nop
nop
nop
nop
skip:

00000000 <skip-0xc>:
   0:   e004        b.n c <skip>
   2:   46c0        nop         ; (mov r8, r8)
   4:   46c0        nop         ; (mov r8, r8)
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)

0xe004 starts with 11100 so that is a branch encoding T2. imm11 is a 4

we need to reach from 0 to 0xC. the pc is two INSTRUCTIONS ahead when the offset is applied. The docs say

Encoding T2 Even numbers in the range –2048 to 2046

and

PC, the program counter 
- When executing an ARM instruction, PC reads as the address of the current instruction plus 8. • When executing a
- Thumb instruction, PC reads as the address of the current instruction
plus 4.

so that all makes sense. 0xC-0x4 = 8. we can only do evens and it makes no sense to branch into the middle of an instruction anyway so divide by 2 because thumb instructions are two bytes (offset is in instructions not bytes). so that gives a 4

0xE004

here is one way to generate the t4 encoding

.thumb
.syntax unified

b skip
nop
nop
nop
nop
nop
skip:

00000000 <skip-0xe>:
   0:   f000 b805   b.w e <skip>
   4:   46c0        nop         ; (mov r8, r8)
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)
   c:   46c0        nop         ; (mov r8, r8)

T4 encoding of branch is 11110 on top of the first halfword indicating this is either an undefined instruction (anything not ARMv6T2, ARMv7) or a thumb2 extension for ARMv6T2, ARMv7

second halfword 10x1 and we see a B so looks good this is a thumb2 extended branch.

S is a 0 imm10 is 0 j1 is 1 j2 is 1 and imm11 is 5

I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

1 EOR 0 is 1 right? not that you get 0. So I1 and I2 are both zeros the s is a zero imm10 is a zero. so we are basically on this one only looking at imm11 as a positive number

the pc is four ahead when executing so so 0xE - 0x4 = 0xA.

0xA / 2 = 0x5 and that is our branch offset offset pc + (5*2)

.syntax unified
.thumb


b.w skip
nop
here:
nop
nop
nop
nop
skip:
b.w here

00000000 <here-0x6>:
   0:   f000 b805   b.w e <skip>
   4:   46c0        nop         ; (mov r8, r8)

00000006 <here>:
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)
   c:   46c0        nop         ; (mov r8, r8)

0000000e <skip>:
   e:   f7ff bffa   b.w 6 <here>

s is a 1, imm10 is 0x3FF j1 is 1 j2 is 1 imm1 is 0x7FA

1 eor 1 is 0 not that you get 1 for i1 and same for i2

imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

s is a 1 so this will sign extend a 1 all but the last few bits are ones so the imm32 is 0xFFFFFFFA or -6 instructions back or -12 bytes back

so our offset is ((0xE + 4) - 6)/2 = 6 as well. or look at it another way from the instruction encoding PC - (6*2) = (0xE + 4) - 12 = 6 branch to 0x6.

So if you wanted to branch to say 0x70 and the address of the instruction is 0x12 then your offset is 0x70-(0x12+4) = 0x62 or 0x31 instructions, we know from the skip the trick is to make s 0 and j1 and j2 a 1

0x12: 0xF000 0xB831  branch to 0x70

so now knowing that we can go back to this:

0:  e7fe        b.n 50 <*ABS*0x50>

the offset is a sign extended 0x7FE or 0xFFFFFFFE. 0xFFFFFFFE*2 + 4 = 0xFFFFFFFC + 4 = 0x00000000. Branch to 0

add a nop

.thumb
nop
b 0x50

00000000 <.text>:
   0:   46c0        nop         ; (mov r8, r8)
   2:   e7fe        b.n 50 <*ABS*0x50>

same encoding

so the disassembly implies an absolute value of 0x50 but is not encoding it, linking doesnt help it just complains

(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 against `*ABS*0x50'

this

.thumb
nop
b 0x51

gives the same encoding.

So basically there is something wrong with this syntax and/or it is looking for a label named 0x50 perhaps?

I hope your example was you wanting to know the encoding of a branch to some address instead of that exact syntax.

arm is not like some other instruction sets, the branches are always relative. so if you can reach the destination based on the encoding then you get a branch, otherwise, you have to use a bx or pop or one of the other ways to modify the pc (with an absolute value).

knowing that the T2 encoding from the docs can only reach 2048 ahead, then put more than 2048 nops between the branch and its destination

b.s: Assembler messages:
b.s:5: Error: branch out of range

Maybe this is what you are looking to do?

.thumb
mov r0,#0x51
bx r0

00000000 <.text>:
   0:   2051        movs    r0, #81 ; 0x51
   2:   4700        bx  r0

branch to absolute address 0x50. for that specific address no need for thumb2 extensions.

.thumb
ldr r0,=0x12345679
bx r0
00000000 <.text>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <.text+0x4>)
   2:   4700        bx  r0
   4:   12345679    eorsne  r5, r4, #126877696  ; 0x7900000

branch to address 0x12345678 or any other possible address.

这篇关于如何生成Thumb指令的机器码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆