跳到基本块的中间 [英] Jump in the middle of basic block
问题描述
基本块定义为以(直接或间接)跳转指令结尾的(非跳转)指令序列.跳转目标地址应该是另一个基本块的开始.考虑一下我有以下汇编代码:
A basic block is defined as a sequence of (non-jump) instructions ending with a jump (direct or indirect) instruction. The jump target address should be the start of another basic block. Consider I have the following assembly code :
106ac: ba00000f blt 106f0 <main+0xb8>
106b0: e3099410 movw r9, #37904 ; 0x9410
106b4: e3409001 movt r9, #1
106b8: e79f9009 ldr r9, [pc, r9]
106bc: e3a06000 mov r6, #0
106c0: e1a0a008 mov sl, r8
106c4: e30993fc movw r9, #37884 ; 0x93fc
106c8: e3409001 movt r9, #1
106cc: e79f9009 ldr r9, [pc, r9]
106d0: e5894000 str r4, [r9]
106d4: e7941105 ldr r1, [r4, r5, lsl #2]
106d8: e1a00007 mov r0, r7
106dc: e12fff31 blx r1
106e0: e0806006 add r6, r0, r6
106e4: e25aa001 subs sl, sl, #1
106e8: e287700d add r7, r7, #13
106ec: 1afffff4 bne 106c4 <main+0x8c>
106f0: e30993d0 movw r9, #37840 ; 0x93d0
106f4: e3409001 movt r9, #1
bb1
106a4: ...
106ac: ba00000f blt 106f0 <main+0xb8>
第一个基本块bb1的目标地址是bb4的开头.
The first basic block bb1 has a target address which is the start of bb4.
bb2
106b0: e3099410 movw r9, #37904 ; 0x9410
.... All other instructions
106c4: e30993fc movw r9, #37884 ; 0x93fc
.... All other instructions
106d8: e1a00007 mov r0, r7
106dc: e12fff31 blx r1
第二个基本块bb2具有一个间接分支,因此仅在运行时才知道目标地址.
The second basic block bb2 has an indirect branch so the target address is known only at runtime.
bb3
106e0: e0806006 add r6, r0, r6
106e4: e25aa001 subs sl, sl, #1
106e8: e287700d add r7, r7, #13
106ec: 1afffff4 bne 106c4 <main+0x8c>
第三个基本块的目标地址不是另一个基本块的开始,而是在bb2的中间. 根据基本块的定义,这是不可能的.但是,实际上,我在多个地方都看到了这种行为(在基本块中间跳跃).如何解释这种行为?是否有可能强制编译器(LLVM)生成除了基本块的开头之外不会跳到其他任何地方的代码?
The third basic block has a target address which is not the start of another basic block but it is in the middle of bb2. According to the definition of a basic block, it is not possible. But, in practice, I am seeing this behavior (jumps in the middle of basic blocks) in multiple places. How to explain this behavior ? Is it possible to force a compiler (LLVM) to generate code that does not jump anywhere else except at the beginning of a basic block ?
bb4
106f0: e30993d0 movw r9, #37840 ; 0x93d0
106f4: e3409001 movt r9, #1
....
Ends with a branch (direct or indirect)
我正在使用工具(SPEDI)生成基本块,并且使用的编译器是LLVM(CLANG前端),目标体系结构是ARM V7 Cortex-A9.
I am generating basic blocks using a tool (SPEDI) and the compiler used is LLVM (CLANG front end) and the targeted architecture is ARM V7 Cortex-A9.
推荐答案
基本块是控件流程图中的节点,这意味着一旦控件进入该块,除了运行整个块外,它无法执行其他任何操作阻止并退出它.这并不意味着它们必须以跳转指令开始或结束.为了更好地理解,请参考 Wikipedia 的摘录:
Basic blocks are the nodes in the control flow graph, which means that once control enters the block, it can't do anything else apart from running through the whole block and exiting it. It doesn't mean that they have to start or end with a jump instruction. For better understanding refer to this excerpt from Wikipedia:
由于其构造过程,在CFG中,每个A→B边都具有 该属性:
Because of its construction procedure, in a CFG, every edge A→B has the property that:
outdegree(A)> 1或indegree(B)> 1(或两者).
outdegree(A) > 1 or indegree(B) > 1 (or both).
因此CFG可以是 从程序的(完整的)开始,至少在概念上获得了 流程图-即每个节点代表一个个体的图 指令-并对每个边缘进行边缘收缩 伪造上述谓词,即收缩其 源只有一个出口,目的地只有一个出口.
The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an edge contraction for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry.
根据这个定义,我将对106b0和106ec之间的代码进行不同的分析:一个从106b0到106c0的块B1,另一个从106c4到106ec的块B2.该代码是一个循环,B1是循环的设置部分,B2是主体.
According to this definition I would analyze code between 106b0 and 106ec differently: one block B1 from 106b0 to 106c0, and one block B2 from 106c4 to 106ec. This code is a loop, B1 is the setup part of the loop and B2 is the body.
在ARM中,诸如106dc处的bl
指令是一个函数调用,这意味着控制将传递给被调用的函数,然后在bl
之后立即返回到该指令.因此,如果我们仅为调用函数构造CFG,则不会将该指令视为块边界.如果我们要对整个程序进行CFG处理,那么这里应该有一个指向被调用函数的边,然后是另一个从被调用函数返回到下一条指令的边.
In ARM a bl
instruction such as the one at 106dc is a function call, meaning that control will be passed to the called function but then returned to the instruction right after the bl
. So if we're only constructing the CFG for the calling function I wouldn't consider this instruction as a block boundary. If we're doing the CFG for the whole program there should be an edge towards the called function here and then another edge back from the called function to the next instruction.
这篇关于跳到基本块的中间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!