LLVM后端:替换x86后端的间接jmp [英] LLVM Backend : Replacing indirect jmps for x86 backend

查看:135
本文介绍了LLVM后端:替换x86后端的间接jmp的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将x86可执行文件的代码中的间接jmp *(eax)指令替换为mov *(eax),ebx; jmp *ebx.

I want to replace indirect jmp *(eax) instructions in the code to mov *(eax),ebx; jmp *ebx for the x86 executables.

在执行此操作之前,我想制作LLVM编译器,并在每次检测到jmp *(eax)指令时通过添加一些打印语句来记录输出.

Before implementing this, i would like to make LLVM compiler, log an output every time it detects a jmp *(eax) instruction by adding some print statements.

然后我要继续替换间接序列.

Then i want to move on to replacing the indirect sequence.

从我在Google搜索和文章中看到的信息来看,我很可能可以通过修改llvm后端中的x86asmprinter来实现.但是我不确定如何去做. 任何帮助或阅读将不胜感激.

From what i have seen from google searches and articles, i can probably achieve this by modifying the x86asmprinter in the llvm backend. But i am not sure how to go about it. Any help or reading would be appreciated.

注意:我的实际要求是处理间接跳转和弹出,但是我想从这里开始了解后端,然后再进行其他工作.

Note: My actual requirement deals with indirect jumps and pop, but i want to start with this to understand the backend a bit more before i dive into anything more.

推荐答案

我已经完成了我的项目.发布我的方法以造福他人.

I am done with my project. Posting my approach for the benefit of others.

LLVM后端的主要功能是转换中间表示 到最终可执行文件,具体取决于目标体系结构和其他 规格. LLVM后端本身包含几个阶段, 目标特定的优化,指令选择,调度和指令 发射.这些阶段是必需的,因为IR是非常通用的表示形式, 需要进行大量修改才能最终将它们转换为目标特定的可执行文件.

The main function of LLVM backend is to convert the Intermediate Representation to the final executable depending on the target architecture and other specification. The LLVM backend itself consists of several phases which does target specific optimization,Instruction Selection, Scheduling and Instruction Emitting. These phases are required because the IR is a very generic representation and requires a lot of modifications to finally convert them to target specific executables.

1)每次编译器生成jmp *(eax)

1)Logging every time the compiler generates jmp *(eax)

我们可以通过在指令发出/打印"阶段添加打印语句来实现此目的.完成从IR进行的大多数主要转换之后,将通过AsmPrinter遍历,该遍历遍历每个函数的基本块中的每个机器指令.该主循环位于lib/CodeGen/AsmPrinter/AsmPrinter.cpp:AsmPrinter::EmitFunctionBody().还有其他相关功能,例如EmitFunctionEpilogue,EmitFunctionPrologue.这些函数最终为特定体系结构调用EmitInstruction,例如:lib/Target/X86/X86AsmPrinter.cpp.如果您稍作修改,可以调用MI.getOpcode()并将其与体系结构的已定义枚举进行比较,以打印日志.

We can achieve this by adding print statements to the Instruction Emitting/Printing phase. After most of the main conversion from IR is done, there is an AsmPrinter pass which goes through each Machine Instruction in a Basic Block of every function. This main loop is at lib/CodeGen/AsmPrinter/AsmPrinter.cpp:AsmPrinter::EmitFunctionBody(). There are other related functions like EmitFunctionEpilogue,EmitFunctionPrologue. These functions finally call EmitInstruction for specific architecture eg: lib/Target/X86/X86AsmPrinter.cpp. If you tinker around a bit, you can call MI.getOpcode() and compare it with defined enums for the architecture to print a log.

例如,对于使用X86中的寄存​​器的跳转,它是X86 :: JMP64r.您可以使用MI.getOperand(0)等获取关联的寄存器.

For example for a jump using register in X86, it is X86::JMP64r. You can get the register associated using MI.getOperand(0) etc.

if(MI->getOpcode() == X86::JMP64r)
dbgs() << "Found jmp *x instruction\n";

2)替换指令 所需的更改取决于所需的替换类型.如果您需要有关寄存器的更多信息或以前的说明,我们将需要在通行证链中的更高位置实施更改.有一种称为选择DAG(有向无环图)"的指令表示,该指令存储每个指令对先前指令的依赖性.例如,按顺序

2)Replacing the instruction The required changes vary depending on the type of replacement you require. If you need more context about registers,or previous instructions, we would need to implement the changes higher up in the Pass chain. There is a representation of instructions called Selection DAG( directed acyclic graph ) which stores dependencies of each instruction to previous instructions. For example, in the sequence

mov myvalue,%rax
jmp *rax

由于rax的值取决于mov指令,因此DAG的jmp指令将指向move指令(可能还有它之前的其他节点).您可以在此处用所需的节点替换节点.如果正确完成,则应最终更改最终说明. SelectionDAG代码位于lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp.总是最好先四处摸索,找出改变的理想场所.在对DAG进行拓扑排序之前,每个IR语句都会经历多次更改,从而使指令按线性顺序排列.可以查看图表 使用在llc --help-hidden中看到的-view-dag *选项. 就我而言,我只是在EmitInstruction中添加了一个特定的检查,并向Emit中添加了两个我想要的指令的代码.

The DAG would have the jmp instruction pointing to the move instruction ( and possibly other nodes before it) since the value of rax depends on the mov instruction. You can replace the Node here with your required Nodes. If done correctly, it should finally change the final instructions. The SelectionDAG code is at lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp. Always best to poke around first to figure out the ideal place to change. Each IR statement goes through multiple changes before the DAG is topologically sorted so that the Instructions are in a linear sequence. The graphs can be viewed using -view-dag* options seen in llc --help-hidden. In my case, I just added a specific check in EmitInstruction and added code to Emit two instructions that i wanted.

LLVM文档始终存在,但是我发现Eli Bendersky的两篇文章比任何其他资源都更有帮助. LLVM指令深入研究LLVM代码生成.文章讨论了非常复杂的TableGen描述和指令匹配过程,如果您有兴趣的话,这很酷.

LLVM documentation is always there, but i found Eli Bendersky's two articles more helpful than any other resources. Life of LLVM Instruction and Deeper look into LLVM Code Generation. The articles discuss the very complex TableGen descriptions and the instruction matching process as well which is kind of cool if you are interested.

这篇关于LLVM后端:替换x86后端的间接jmp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆