我正在编写自己的JIT解释器.如何执行生成的指令? [英] I'm writing my own JIT-interpreter. How do I execute generated instructions?

查看:89
本文介绍了我正在编写自己的JIT解释器.如何执行生成的指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为虚拟机课程的一部分,我打算编写自己的JIT解释器.我对高级语言,编译器和解释器有很多了解,但是对x86汇编(或C而言)却很少或根本不了解.

实际上,我不知道JIT的工作原理,但是我的看法是:用某种中间语言阅读程序.将其编译为x86指令.确保最后一条指令返回到VM代码中合理的位置.将指令存储在内存中的某些位置.无条件跳转到第一条指令.瞧!

因此,考虑到这一点,我有以下小型C程序:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int *m = malloc(sizeof(int));
    *m = 0x90; // NOP instruction code

    asm("jmp *%0"
               : /* outputs:  */ /* none */
               : /* inputs:   */ "d" (m)
               : /* clobbers: */ "eax");

    return 42;

}

好的,所以我的意图是让该程序将NOP指令存储在内存中的某个位置,跳转到该位置,然后可能崩溃(因为我还没有设置任何程序使程序返回主程序).

问题:我在正确的道路上吗?

问题:您能告诉我一个修改后的程序,该程序设法回到主程序内部的某个地方吗?

问题:我应该注意的其他问题?

PS:我的目标是获得理解,不一定要以正确的方式做所有事情.


感谢所有反馈.以下代码似乎是开始使用并在我的Linux机器上运行的地方:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

unsigned char *m;

int main() {
        unsigned int pagesize = getpagesize();
        printf("pagesize: %u\n", pagesize);

        m = malloc(1023+pagesize+1);
        if(m==NULL) return(1);

        printf("%p\n", m);
        m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
        printf("%p\n", m);

        if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
                printf("mprotect fail...\n");
                return 0;
        }

        m[0] = 0xc9; //leave
        m[1] = 0xc3; //ret
        m[2] = 0x90; //nop

        printf("%p\n", m);


asm("jmp *%0"
                   : /* outputs:  */ /* none */
                   : /* inputs:   */ "d" (m)
                   : /* clobbers: */ "ebx");

        return 21;
}

解决方案

问题:我在正确的道路上吗?

我会说.

问题:您能告诉我一个修改后的程序,该程序设法将其返回到main内部吗?

我没有适合您的任何代码,但是获取生成的代码并返回的一种更好的方法是使用一对call/ret指令,因为它们将自动管理返回地址./p>

问题:我应该提防的其他问题?

是-作为一种安全措施,许多操作系统都会阻止您在没有特殊安排的情况下在堆上执行代码.这些特殊安排通常使您不得不将相关的内存页标记为可执行.

在Linux上,这是通过 mprotect() PROT_EXEC完成的.

I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).

Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!

So, with that in mind, I have the following small C program:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int *m = malloc(sizeof(int));
    *m = 0x90; // NOP instruction code

    asm("jmp *%0"
               : /* outputs:  */ /* none */
               : /* inputs:   */ "d" (m)
               : /* clobbers: */ "eax");

    return 42;

}

Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).

Question: Am I on the right path?

Question: Could you show me a modified program that manages to find its way back to somewhere inside main?

Question: Other issues I should beware of?

PS: My goal is to gain understanding, not necessarily do everything the right way.


Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

unsigned char *m;

int main() {
        unsigned int pagesize = getpagesize();
        printf("pagesize: %u\n", pagesize);

        m = malloc(1023+pagesize+1);
        if(m==NULL) return(1);

        printf("%p\n", m);
        m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
        printf("%p\n", m);

        if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
                printf("mprotect fail...\n");
                return 0;
        }

        m[0] = 0xc9; //leave
        m[1] = 0xc3; //ret
        m[2] = 0x90; //nop

        printf("%p\n", m);


asm("jmp *%0"
                   : /* outputs:  */ /* none */
                   : /* inputs:   */ "d" (m)
                   : /* clobbers: */ "ebx");

        return 21;
}

解决方案

Question: Am I on the right path?

I would say yes.

Question: Could you show me a modified program that manages to find its way back to somewhere inside main?

I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.

Question: Other issues I should beware of?

Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.

On Linux this is done using mprotect() with PROT_EXEC.

这篇关于我正在编写自己的JIT解释器.如何执行生成的指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆