如何生成Thumb指令的机器码? [英] How to generate the machine code of Thumb instructions?

查看:2893
本文介绍了如何生成Thumb指令的机器码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索Google生成ARM指令的机器码,例如



对于汇编代码:

  B 0x50 

如何将立即值0x50编码为4字节的机器码?或者,如果我想写一个C函数,它接受B指令和as输入,并返回编码的机器码。如何实现这样的函数?

  unsigned int gen_mach_code(int指令,int relative_addr)
{
/ * int指令参数假定为B * /
/ *编码方法假定为T4 * /
unsigned int mach_code;
/ *构造B c的机器码.W< label> * /
return mach_code;
}

我知道ARM上的立即值编码。这里 http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ 是一个很好的教程。



我只想知道imm10和imm11的位置,以及如何构建完整的机器代码。

解决方案

首先,ARM7TDMI不支持thumb2扩展,而是基本定义了原始的缩略指令集。



那么为什么不试试呢?

  .thumb 
@ .syntax unified

b 0x50

运行这些命令

  arm-whatever-whatever-as bs -o bo 
arm-whatever-whatever-objdump -D bo

获取此输出

  0:e7fe bn 50 <* ABS * 0x50 

这是一个T2编码,如ARMv4T支持的这条指令的更新文档所示, ARMv5T *,ARMv6 *,ARMv7 ARM7TDMI是一个ARMv4t



所以我们看到E7匹配指令定义
的11100开头,所以imm11是0x7FE。这基本上是分支到地址0x000的编码,因为这不与任何链接。我如何知道?

  .thumb 
b skip
nop
nop
nop
nop
nop
skip:

00000000< skip-0xc>:
0:e004 bn c< skip>
2:46c0 nop; (mov r8,r8)
4:46c0 nop; (mov r8,r8)
6:46c0 nop; (mov r8,r8)
8:46c0 nop; (mov r8,r8)
a:46c0 nop; (mov r8,r8)

0xe004以11100开始,因此是一个分支编码T2。 imm11是一个4



我们需要从0到0xC。当应用偏移时,pc是两个INSTRUCTIONS。文档说

 编码T2 -2048到2046之间的偶数数字

  PC,程序计数器
- 执行ARM指令时,PC读取当前指令的地址加8•执行
- Thumb指令时,PC读取当前指令
加4的地址。

0xC-0x4 = 8.我们只能做evens,所以分支到指令的中间是没有意义的,所以除以2,因为Thumb指令是两个字节(偏移量在指令中而不是字节)。所以给出4

  0xE004 

这里是生成t4编码的一种方法

  .thumb 
.syntax unified

b skip
nop
nop
nop
nop
nop
skip:

00000000 < skip-0xe>:
0:f000 b805 bw e< skip>
4:46c0 nop; (mov r8,r8)
6:46c0 nop, (mov r8,r8)
8:46c0 nop; (mov r8,r8)
a:46c0 nop; (mov r8,r8)
c:46c0 nop; (mov r8,r8)

分支的T4编码是第一个半字上面的11110,表示要么是未定义的指令(任何不是ARMv6T2,ARMv7)或者是ARMv6T2,ARMv7



第二个半字10x1的thumb2扩展,我们看到一个B看起来不错这是一个thumb2扩展分支。



S是0 imm10是0 j1是1 j2是1,imm11是5

  I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:'0',32); 

1 EOR 0是1对吗?不是你得到0.所以I1和I2都是零,
s是零imm10是一个零。所以我们基本上只有这一个只看imm11作为一个正数。



pc是四个前面,因此0xE - 0x4 = 0xA。



0xA / 2 = 0x5,这是我们的分支偏移偏移pc +(5 * 2)

  .syntax unified 
.thumb


bw skip
nop
这里:
nop
nop
nop
nop
skip:
bw here

00000000< here-0x6>:
0:f000 b805 bw e< skip>
4:46c0 nop; (mov r8,r8)

00000006< here>:
6:46c0 nop; (mov r8,r8)
8:46c0 nop; (mov r8,r8)
a:46c0 nop; (mov r8,r8)
c:46c0 nop; (mov r8,r8)

0000000e< skip> ;:
e:f7ff bffa b.w 6< here>

s是1,imm10是0x3FF j1是1 j2是1 imm1是0x7FA



1 eor 1是0不是你为i1获得1,而对于i2则为1

  imm32 = SignExtend(S:I1:I2:imm10:imm11:'0',32); 

s是一个1,所以这将签署扩展一个1除了最后几个位是一个所以imm32是0xFFFFFFFA或-6指令回或-12字节回



,所以我们的偏移量为((0xE + 4)-6)/ 2 = 6。或者以另一种方式从编码PC的指令 - (6 * 2)=(0xE + 4) - 12 = 6分支到0x6查看



所以如果你想分支到0x70,指令的地址是0x12,那么你的偏移量是0x70-(0x12 + 4)= 0x62或0x31指令,我们知道从跳过的技巧是使s 0和j1和j2 a 1

  0x12:0xF000 0xB831分支到0x70 

现在知道我们可以回到这里:

  0:e7fe bn 50 <* ABS * 0x50> 

偏移量是扩展的符号0x7FE或0xFFFFFFFE。 0xFFFFFFFE * 2 + 4 =
0xFFFFFFFC + 4 = 0x00000000。分支到0



添加nop

  .thumb 
nop
b 0x50

00000000< .text> ;:
0:46c0 nop; (mov r8,r8)
2:e7fe b.n 50 <* ABS * 0x50>

相同的编码



绝对值为0x50但不对其编码,链接不帮助它只是抱怨

 (。text + 0x0):relocation truncated适合:R_ARM_THM_JUMP11针对`* ABS * 0x50'

  .thumb 
nop
b 0x51


b $ b

给出相同的编码。



所以基本上这个语法有问题和/或它正在寻找一个名为0x50的标签?



我希望你的例子是你想知道分支到某个地址的编码,而不是确切的语法。



arm不像其他指令集,分支总是相对的。所以如果你可以基于编码到达目的地,那么你得到一个分支,否则,你必须使用bx或pop或其他方式之一修改pc(使用绝对值)。



知道文档的T2编码只能到达2048,然后在分支和其目标之间放置超过2048个。

  bs:汇编程序消息:
bs:5:错误:分支超出范围

也许这是你想要做什么?

  .thumb 
mov r0,#0x51
bx r0

00000000 < .text> ;:
0:2051 movs r0,#81; 0x51
2:4700 bx r0

分支到绝对地址0x50。对于该特定地址,不需要thumb2扩展。

  .thumb 
ldr r0,= 0x12345679
bx r0
00000000< .text> ;:
0:4800 ldr r0,[pc,#0]; (4< .text + 0x4>)
2:4700 bx r0
4:12345679 eorsne r5,r4,#126877696; 0x7900000

分支到地址0x12345678或任何其他可能的地址。


I searched Google for generating machine code of ARM instructions, such as this one Converting very simple ARM instructions to binary/hex

The answer referenced ARM7TDMI-S Data Sheet (ARM DDI 0084D). The diagram of data processing instructions is good enough. Unfortunately, it's for ARM instructions, not for Thumb/Thumb-2 instructions.

Take the B instruction as an example. ARM Architecture Reference Manual - ARMv7-A and ARMv7-R edition section A8.8.18, Encoding T4:

For the assembly code:

B 0x50

How can I encode the immediate value 0x50 into the 4-byte machine code? Or if I want to write a C function that takes the B instruction and the as inputs, and return the encoded machine code. How can I implement such a function?

unsigned int gen_mach_code(int instruction, int relative_addr)
{
    /* the int instruction parameter is assumed to be B */
    /* encoding method is assumed to be T4 */
    unsigned int mach_code;
    /* construc the machine code of B<c>.W <label> */
    return mach_code;
}

I know the immediate values encoding on ARM. Here http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ is a good tutorial.

I just want to know where is the imm10 and imm11 from, and how to construct the full machine code with them.

解决方案

First and foremost the ARM7TDMI does not support the thumb2 extentions, instead it basically defines the original thumb instruction set.

so why not just try it?

.thumb
@.syntax unified

b 0x50

run these commands

arm-whatever-whatever-as b.s -o b.o
arm-whatever-whatever-objdump -D b.o

get this output

0:  e7fe        b.n 50 <*ABS*0x50>

so that is a T2 encoding and as the newer docs show for this instruction that is supported by ARMv4T, ARMv5T*, ARMv6*, ARMv7 the ARM7TDMI is an ARMv4t

so we see that E7 matches the 11100 start of that instruction definition so the imm11 is 0x7FE. which is basically an encoding of branch to the address 0x000 since this isnt linked with anything. how do I know that?

.thumb
b skip
nop
nop
nop
nop
nop
skip:

00000000 <skip-0xc>:
   0:   e004        b.n c <skip>
   2:   46c0        nop         ; (mov r8, r8)
   4:   46c0        nop         ; (mov r8, r8)
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)

0xe004 starts with 11100 so that is a branch encoding T2. imm11 is a 4

we need to reach from 0 to 0xC. the pc is two INSTRUCTIONS ahead when the offset is applied. The docs say

Encoding T2 Even numbers in the range –2048 to 2046

and

PC, the program counter 
- When executing an ARM instruction, PC reads as the address of the current instruction plus 8. • When executing a
- Thumb instruction, PC reads as the address of the current instruction
plus 4.

so that all makes sense. 0xC-0x4 = 8. we can only do evens and it makes no sense to branch into the middle of an instruction anyway so divide by 2 because thumb instructions are two bytes (offset is in instructions not bytes). so that gives a 4

0xE004

here is one way to generate the t4 encoding

.thumb
.syntax unified

b skip
nop
nop
nop
nop
nop
skip:

00000000 <skip-0xe>:
   0:   f000 b805   b.w e <skip>
   4:   46c0        nop         ; (mov r8, r8)
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)
   c:   46c0        nop         ; (mov r8, r8)

T4 encoding of branch is 11110 on top of the first halfword indicating this is either an undefined instruction (anything not ARMv6T2, ARMv7) or a thumb2 extension for ARMv6T2, ARMv7

second halfword 10x1 and we see a B so looks good this is a thumb2 extended branch.

S is a 0 imm10 is 0 j1 is 1 j2 is 1 and imm11 is 5

I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

1 EOR 0 is 1 right? not that you get 0. So I1 and I2 are both zeros the s is a zero imm10 is a zero. so we are basically on this one only looking at imm11 as a positive number

the pc is four ahead when executing so so 0xE - 0x4 = 0xA.

0xA / 2 = 0x5 and that is our branch offset offset pc + (5*2)

.syntax unified
.thumb


b.w skip
nop
here:
nop
nop
nop
nop
skip:
b.w here

00000000 <here-0x6>:
   0:   f000 b805   b.w e <skip>
   4:   46c0        nop         ; (mov r8, r8)

00000006 <here>:
   6:   46c0        nop         ; (mov r8, r8)
   8:   46c0        nop         ; (mov r8, r8)
   a:   46c0        nop         ; (mov r8, r8)
   c:   46c0        nop         ; (mov r8, r8)

0000000e <skip>:
   e:   f7ff bffa   b.w 6 <here>

s is a 1, imm10 is 0x3FF j1 is 1 j2 is 1 imm1 is 0x7FA

1 eor 1 is 0 not that you get 1 for i1 and same for i2

imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

s is a 1 so this will sign extend a 1 all but the last few bits are ones so the imm32 is 0xFFFFFFFA or -6 instructions back or -12 bytes back

so our offset is ((0xE + 4) - 6)/2 = 6 as well. or look at it another way from the instruction encoding PC - (6*2) = (0xE + 4) - 12 = 6 branch to 0x6.

So if you wanted to branch to say 0x70 and the address of the instruction is 0x12 then your offset is 0x70-(0x12+4) = 0x62 or 0x31 instructions, we know from the skip the trick is to make s 0 and j1 and j2 a 1

0x12: 0xF000 0xB831  branch to 0x70

so now knowing that we can go back to this:

0:  e7fe        b.n 50 <*ABS*0x50>

the offset is a sign extended 0x7FE or 0xFFFFFFFE. 0xFFFFFFFE*2 + 4 = 0xFFFFFFFC + 4 = 0x00000000. Branch to 0

add a nop

.thumb
nop
b 0x50

00000000 <.text>:
   0:   46c0        nop         ; (mov r8, r8)
   2:   e7fe        b.n 50 <*ABS*0x50>

same encoding

so the disassembly implies an absolute value of 0x50 but is not encoding it, linking doesnt help it just complains

(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 against `*ABS*0x50'

this

.thumb
nop
b 0x51

gives the same encoding.

So basically there is something wrong with this syntax and/or it is looking for a label named 0x50 perhaps?

I hope your example was you wanting to know the encoding of a branch to some address instead of that exact syntax.

arm is not like some other instruction sets, the branches are always relative. so if you can reach the destination based on the encoding then you get a branch, otherwise, you have to use a bx or pop or one of the other ways to modify the pc (with an absolute value).

knowing that the T2 encoding from the docs can only reach 2048 ahead, then put more than 2048 nops between the branch and its destination

b.s: Assembler messages:
b.s:5: Error: branch out of range

Maybe this is what you are looking to do?

.thumb
mov r0,#0x51
bx r0

00000000 <.text>:
   0:   2051        movs    r0, #81 ; 0x51
   2:   4700        bx  r0

branch to absolute address 0x50. for that specific address no need for thumb2 extensions.

.thumb
ldr r0,=0x12345679
bx r0
00000000 <.text>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <.text+0x4>)
   2:   4700        bx  r0
   4:   12345679    eorsne  r5, r4, #126877696  ; 0x7900000

branch to address 0x12345678 or any other possible address.

这篇关于如何生成Thumb指令的机器码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆