编译AST到程序集 [英] Compiling an AST to Assembly

查看:233
本文介绍了编译AST到程序集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个抽象语法树,我需要将其转换为虚拟机的程序集.我不知道如何最好地做到这一点,所以我开始使用一串字符串模板.我的意思的伪代码示例,即需要编译具有一个条件的简单if语句:

I have an abstract syntax tree that I need to convert to assembly for a virtual machine. I don't know how best to do this so I started using a chain of string templates. Pseudo-code example of what I mean, say a simplistic if-statement with one condition needs to be compiled:

std::string compile_if(Node* n) {
    std::string str = "";

    curLabel = nLabels++;

    str += compile_comparison(n->getChild(0));

    str += ".true"+curLabel+":";
    str += compile_block(n->getChild(1));

    str += ".false"+curLabel+":";

    return str;
}

每个compile_ *会根据当前/下一个AST节点生成一个汇编字符串.然后,最后的字符串将通过汇编器运行.这似乎草率且难以维护,当然这不是大多数编译器所做的.这是一个坏主意,我应该改变它吗?其他大多数编译器如何生成虚拟汇编代码/机器代码?

Where each compile_* generates an assembly string based on the current/next AST nodes. Then the final string is run through an assembler. This seems sloppy and hard to maintain, surely this isn't what most compilers do. Is this a bad idea, should I change it? How do most other compilers generate virtual assembly code / machine code?

推荐答案

免责声明:我只有X86机器代码的经验.例如,其他指令集可能具有不同的寻址功能,因此我的建议中的某些部分可能不适用.很抱歉,我目前没有时间研究指令集.

Disclaimer: I only have experience with X86 machine code. Other instruction sets may have, for example, different addressing capabilities, so parts of my advice might not apply. I'm sorry that I don't have time to research instruction sets at the moment.

首先,大多数编译器不会将汇编生成为文本,因为您可能已经意识到,将代码序列化为汇编只是让汇编器直接对其进行解析有点效率不高.分别具有编译组装两个阶段 是合理的,但不是必不可少的.

Well firstly, most compilers don't generate assembly as text, because it's kinda inefficient to serialise the code into assembly only to have it parsed straight away by the assembler, as you have probably realised. It is reasonable to have separate compilation and assembly phases, but not essential.

在编译阶段,我将考虑两种策略:

In the compilation phase, the two strategies I would consider are:

  • (a)将程序集生成为指令对象的树/数组,它们可以象征性地相互引用.在组装阶段,需要将这些序列化为字节码/机器码.我建议使用此方法,即使它会使编译器的体系结构更加复杂.

  • (a) generate the assembly as a tree / array of instruction objects which can symbolically refer to one another. In the assembly phase these need to be serialised into bytecode/machinecode. I'd recommend this method, even if it makes your compiler's architecture a little more complex.

(b)将程序集以机器代码/字节码的形式生成到缓冲区中,并带有一些辅助函数;在这种情况下,您实际上没有单独的组装阶段. 我已经亲自尝试过这种方法,并且在单个函数的范围内还不错,但是由于在组装之前不知道函数的大小可能会引起一些额外的困难.

(b) generate the assembly as machinecode/bytecode into a buffer, with some helper functions; in this case you don't really have a separate assembly phase. I've personally tried this method, and within the scope of a single function it's not bad, but may cause some extra difficulties by not knowing how large a function is going to be before it's assembled.

我猜想(a)是优化GCC等编译器使用的方法,而(b)是诸如TCC之类的高速编译器使用的方法.

I would guess that (a) is the approach used by optimising compilers like GCC, while (b) is the approach used by high-speed compilers like TCC.

让我们通过检查现有编译器为简单的if/else分支生成的代码来再次考虑if示例:

Let's again consider the if example by examining the code that an existing compiler generates for a simple if/else branch:

请注意反汇编中的重叠跳转-一个跳过"taken"块,另一个跳过"not-taken"块.

Note the overlapping jumps in the disassembly - one that skips the 'taken' block and one that skips the 'not-taken' block.

这些是相对跳转,因此要组装它们,我们需要知道跳转指令和目标之间有多少字节指令.

These are relative jumps, so in order to assemble them we need to know how many bytes of instructions are between the jump instruction and the destination.

以下是使用策略(a):

Instruction[] compile_if(IfNode n) {
    Instruction[] code;

    code ~= compile_condition(n.condition);

    Instruction skip_taken = new JumpInstruction(`jz`);
    code ~= skip_taken;

    code ~= compile_block(n.taken_block);

    Instruction skip_nottaken = new JumpInstruction(`jmp`);
    code ~= skip_nottaken;

    Instruction[] nottaken_code = compile_block(n.nottaken_block);
    skip_taken.destination = nottaken_code[0];
    code ~= nottaken_code;

    Instruction end = new NopInstruction();
    skip_nottaken.destination = end;
    code ~= end;

    return code;
};

这应该很不言自明.

请注意,指令如何在符号上相互引用(skip_taken.destination = nottaken_code[0]),而不是像串行机器码中那样通过字节偏移来引用.我们将这些偏移量计算留给汇编程序.

Note how instructions refer to one another symbolically (skip_taken.destination = nottaken_code[0]), rather than by byte-offsets like in serialised machinecode. We leave those offset calculations for the assembler.

还要注意,只有在JumpInstruction可用时,我们才设置它们的目的地.

Also note how we set the destinations of the JumpInstructions only once they become available.

最后的NopInstruction只是为了给skip_nottaken跳转提供一些参考.

The NopInstruction at the end is just to give the skip_nottaken jump something to refer to.

现在,我们如何实际将这些跳转组合成真实的机器码/字节码?这是一种可能性(一个非常基本的示例):

Now, how do we actually assemble these jumps into real machinecode/bytecode? here's one possibility (a very basic example):

byte[2] assemble_jz(Instruction[] code, int idx) {
    // assemble the jz instruction at code[idx]

    JumpInstruction jump = code[idx];
    ++idx;

    byte jump_offset = 0;
    while (code[idx] != jump.destination) {
        jump_offset += size_of_instruction(code[idx]);
        ++idx;
    };

    byte[2] machinecode = [
        0x74, // jz short
        jump_offset
    ];
    return machinecode;
};

由于汇编器可以访问所有指令对象,因此可以通过向前扫描直到找到目标指令来计算相对跳转的实际偏移量.

Because the assembler has access to all the instruction objects, it can calculate the actual offsets for relative jumps by scanning ahead until it finds the destination instruction.

我希望这个简短的介绍可以帮助您开始设计自己的编译器后端.显然,我不建议您像我的示例一样编写编译器,但是它应该为您提供一些有关如何解决编译和组装非线性指令块的一般问题的想法.

I hope this brief introduction helps you get started with designing your own compiler backend. Obviously I'm not suggesting that you write your compiler exactly like my example, but it should give you some ideas of how to approach the general problem of compiling and assembling non-linear instruction blocks.

您可能还想看看一些现有的汇编器API,例如 https://github.com/asmjit/asmjit .

You might also want to take a look at some existing assembler APIs such as https://github.com/asmjit/asmjit .

祝你好运.

这篇关于编译AST到程序集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆