将数据与ARM中的指令区分开 [英] Differentiate data from instructions in ARM

查看:165
本文介绍了将数据与ARM中的指令区分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在32位ARM Linux内核中,如何区分代码部分中嵌入的数据和指令?

In (32-bit) ARM Linux kernels, how to differentiate data embedded in the code section, from instructions?

最好采用轻量级的方法,例如位掩码,该方法可以轻松实现.将反汇编程序嵌入内核是不明智的.

It is better to have a light-weight approach, like bit masks, which can be easily implemented. It is not wise to embed a dissembler into the kernel.

推荐答案

通常,您要的是不可能的.

In general, what you're asking for is impossible.

考虑一下此函数,它恰巧使用了一个太大而无法编码为立即数的数据值:

Consider this function which happens to use a data value too big to encode as an immediate:

@ void patch_nop(void *code_addr);
patch_nop:
    ldr r1, =0xe1a00000
    str r1, [r0]
    bx lr

在经过汇编程序并返回时,它看起来像这样:

which, by the time it's been through an assembler and back, looks like this:

$ arm-none-eabi-objdump -d a.out

a.out:     file format elf32-littlearm


Disassembly of section .text:

    00000000 <patch_nop>:
       0:   e59f1004        ldr     r1, [pc, #4]    ; c <patch_nop+0xc>
       4:   e5801000        str     r1, [r0]
       8:   e12fff1e        bx      lr
       c:   e1a00000        .word   0xe1a00000

由于有了ELF数据,我们仍然可以确定函数在何处结束并且立即数池开始了,但是objdump所做的挖掘各节和符号的工作几乎并不轻量级",谁说您仍然拥有这些功能?如果您只有代码 该怎么办?

Thanks to the ELF data, we can still ascertain where the function ends and the literal pool begins, but the work objdump is doing to dig through the sections and symbols is hardly 'lightweight', and who says you have those anyway? What if you have just the code?

$ arm-none-eabi-objcopy -Obinary a.out bin
$ arm-none-eabi-objdump -D -marm -bbinary bin

bin:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   e59f1004        ldr     r1, [pc, #4]    ; 0xc
   4:   e5801000        str     r1, [r0]
   8:   e12fff1e        bx      lr
   c:   e1a00000        nop                     ; (mov r0, r0)

在那里.嵌入在您的指令流中的是数据,它是一条指令.即使偶然出现 看起来像指令的数据,也没有.从字面上看,仅凭这32位您就无法推断出它们将不被执行(至少不是从那个位置).

There. Embedded in your instruction stream, you have data, which is an instruction. Not even data which accidentally happens to look like an instruction. There is literally nothing you can take from those 32 bits alone to infer that they are not going to be executed (well, not from that location at least).

有一些启发式方法可能有助于做出有根据的猜测,特别是如果可以假设有任何其他先验知识将其范围缩小的话:

There are a few heuristics which might help make an educated guess, particularly if any additional prior knowledge can be assumed to narrow it down:

  • 几乎可以将任何 编码为立即数的指令,因为编译器/汇编器一开始就不会将其作为文字输出.但是,理想情况下,您至少希望知道前面的代码是ARM还是Thumb,以便知道什么合适的立即数范围是 * .

  • Anything which can be encoded as an immediate is almost certainly an instruction, because a compiler/assembler wouldn't have emitted it as a literal in the first place. However, you'd ideally want to know at least whether the preceding code is ARM or Thumb in order to know what the appropriate immediate range is*.

任何未定义的指令通常都将是数据,除非它恰好是代码想要故意引发undef异常.而且,您实际上必须具有大部分的反汇编程序,才能检查某些内容是否与任何已定义的编码都不匹配.在ARM/Thumb之上.

Anything which is an undefined instruction is usually going to be data, unless it so happens that it's code which wants to intentionally raise an undef exception. And you essentially have to have most of a disassembler to check that something doesn't match any defined encoding. On top of the ARM/Thumb thing.

无条件分支之后的任何内容都可能是文字数据,特别是如果您有符号并且可以说它非常接近以下函数的开始,或者您对所寻找的数据有一定了解的话它看起来像数据.如果您只是盯着反汇编,则后一点当然是相关的-实际上,字面数据往往是诸如地址之类的东西,一旦您将代码整体视为一个整体,它们通常会像拇指一样很突出[sup>† .

Anything immediately following an unconditional branch might be literal data, particularly if you have symbols and can tell it's very close to the start of the following function, or if you have some knowledge of the data you're looking for and it looks like data. The latter point is certainly relevant if you're just eyeballing disassembly - in practice literal data tends to be stuff like addresses, which generally stand out like a sore thumb once you look at the code as a whole.

检查某物是否为文字的最可靠方法是查看前面的代码(最多1025条指令),检查是否有针对该地址的PC相对负载.您只需要检查文字加载编码(有简单的位屏蔽操作),然后在找到相对偏移量的情况下对其进行解码.理想情况下,您希望解决ARM/Thumb问题,以免因检查不正确的编码而导致误报,并且在最绝对的病理情况下,您仍可能会遇到前一个文字池中的某些数据,而这些数据恰好看起来像是文字负载目标你的地址;永远不要说永远.

The most reliable way to check if something is a literal is to look through the preceding code (up to 1025 instructions away) checking for a PC-relative load targeting that address. You'd only need to check against literal load encodings (there's your simple bitmasking operation), then decode the relative offset if you find one. Ideally you'd want to solve the ARM/Thumb thing to avoid false positives from checking against inappropriate encodings, and in the most absolutely pathological case you could still run into some data in a preceding literal pool which happens to look like a literal load targeting your address; never say never.

当然,这仍然全部假设编译器/汇编器自动发出的文字池;当涉及完全手写的汇编代码时,所有赌注都被取消:

And of course, that's still all assuming literal pools automatically emitted by a compiler/assembler; when it comes to entirely handwritten assembly code, all bets are off:

patch_nop2:
    ldr r1, [pc, #-4]
    mov r0, r0
    str r1, [r0]
    bx lr

是代码吗?是的.是数据吗?是的.

Is is code? Yes. Is it data? Yes.

*顺便说一句,在ARM和Thumb代码之间进行区分可归结为与该代码本质上相同的问题-此位模式是什么意思?" -并且在没有外部帮助的情况下同样很重要.

†​​无双关语

这篇关于将数据与ARM中的指令区分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆