解释器如何解释代码? [英] How does an interpreter interpret the code?

查看:166
本文介绍了解释器如何解释代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了简单起见,我们假设这种情况,我们有一个2位计算机,它有一对2位寄存器,称为r1和r2,只用于立即寻址。



假设位序列 00 表示添加到我们的cpu。此外, 01 表示将数据移动到r1, 10 表示将数据移动到r2。



因此,此计算机和汇编器有汇编语言,其中示例代码将写为

  mov r1,1 
mov r2,2
add r1,r2

简单来说,当我将这段代码合并为母语时,文件将是:

  0101 1010 0001 

上面的12位是本机代码:

 将十进制1设置为R1,将十进制2设置为R2,添加数据并存储在R1中。 

所以这基本上是一个编译代码的工作原理,对吗?



假设有人为这个架构实现JVM。在Java中,我将编写如下代码:

  int x = 1 + 2; 

JVM如何解释此代码?我的意思是最终相同的位模式必须传递给cpu,不是吗?所有的cpu有一些指令,它可以理解和执行,他们毕竟只是一些位。假设编译的Java字节码看起来像这样:

  1111 1100 1001 

或任何..这是否意味着解释在执行时将此代码更改为0101 1010 0001?如果是,它已经在本机代码中,所以为什么说JIT只在几次后才开始?如果它不完全转换为0101 1010 0001,那么它做什么?它如何使cpu做添加?



也许我的假设有一些错误。



我知道解释速度慢,编译代码更快,但不可移植,而虚拟机解释一个代码,但如何?我在寻找如何确切/技术解释是做。

解决方案

您描述的CPU架构是不幸的太限制,使这一点与所有的中间步骤真的很清楚。相反,我将编写伪C和伪x86汇编器,希望以一种清楚的方式,而不非常熟悉C或x86。



编译的JVM字节码可能看起来像这样:

  ldc 0#推第一个常数(== 1)
ldc 1#push第二个常数(== 2)
iadd#pop两个整数,并推它们的和
istore_0#pop result并存储在本地变量

解释器在数组中具有这些指令的二进制编码,以及引用当前指令的索引。它还有一个常量数组,一个用作堆栈的内存区域和一个用于局部变量的内存区域。然后解释器循环看起来像这样:

  while(true){
switch(instructions [pc]){
case LDC:
sp + = 1; // make constant for constant
stack [sp] = constants [instructions [pc + 1]];
pc + = 2; //双字节指令
case IADD:
stack [sp-1] + = stack [sp]; //添加到第一个操作数
sp - = 1; // pop other operand
pc + = 1; //一字​​节指令
case ISTORE_0:
locals [0] = stack [sp];
sp - = 1; // pop
pc + = 1; //一字​​节指令
// ...其他情况...
}
}

这个 C代码被编译成机器码并运行。如你所见,它是高度动态的:它每次执行指令时检查每个字节码指令,并且所有值通过堆栈(即RAM)。



实际添加本身可能发生在寄存器中,围绕添加的代码与Java到机器代码编译器将发出的代码有很大的不同。这里有一个摘录从C编译器可能把上面的(pseudo-x86):

  .ldc:
incl%esi#递增变量pc,前半部分pc + = 2;
movb%ecx,程序(%esi)#在指令后加载字节
movl%eax,常量(,%ebx,4)#从池加载常量
incl%edi#increment sp
movl%eax,stack(,%edi,4)#将常数写入栈
incl%esi#pc的另一半+ = 2
jmp .EndOfSwitch

.addi
movl%eax,stack(,%edi,4)#加载第一个操作数
decl%edi#sp - = 1;
addl stack(,%edi,4),%eax#add
incl%esi#pc + = 1;
jmp .EndOfSwitch

您可以看到,添加的操作数来自内存,而不是是硬编码的,即使为了Java程序的目的,它们是恒定的。这是因为对于解释器,它们不是常数。解释器被编译一次,然后必须能够执行各种程序,而不产生专门的代码。



JIT编译器的目的是这样做:生成专用代码。 JIT可以分析堆栈用于传输数据的方式,程序中各种常量的实际值以及执行的计算顺序,以生成更有效地执行相同操作的代码。在我们的示例程序中,它将局部变量0分配给一个寄存器,用移动常数代替对常数表的访问( movl%eax,$ 1 将堆栈访问重定向到正确的机器寄存器。忽略通常会做的更多优化(复制传播,常数折叠和死代码消除),可能会得到如下代码:

  movl%ebx,$ 1#ldc 0 
movl%ecx,$ 2#ldc 1
movl%eax,%ebx#(1/2)addi
addl%eax, %ecx#(2/2)addi
#no istore_0,local variable 0 ==%eax,we do done


For simplicity imagine this scenario, we have a 2-bit computer, which has a pair of 2 bit registers called r1 and r2 and only works with immediate addressing.

Lets say the bit sequence 00 means add to our cpu. Also 01 means move data to r1 and 10 means move data to r2.

So there is an Assembly Language for this computer and a Assembler, where a sample code would be written like

mov r1,1
mov r2,2
add r1,r2

Simply, when I assemble this code to native language and the file will be something like:

0101 1010 0001

the 12 bits above is the native code for:

Put decimal 1 to R1, Put decimal 2 to R2, Add the data and store in R1. 

So this is basically how a compiled code works, right?

Lets say someone implements a JVM for this architecture. In Java I will be writing code like:

int x = 1 + 2;

How exactly will JVM interpret this code? I mean eventually the same bit pattern must be passed to the cpu, isn't it? All cpu's have a number of instructions that it can understand and execute, and they are after all just some bits. Lets say the compiled Java byte-code looks something like this:

1111 1100 1001

or whatever.. Does it mean that the interpreting changes this code to 0101 1010 0001 when executing? If it is, it is already in the Native Code, so why is it said that JIT only kicks in after a number of times? If it does not convert it exactly to 0101 1010 0001, then what does it do? How does it make the cpu do the addition?

Maybe there are some mistakes in my assumptions.

I know interpreting is slow, compiled code is faster but not portable, and a virtual machine "interprets" a code, but how? I am looking for "how exactly/technically interpreting" is done. Any pointers (such as books or web pages) are welcome instead of answers as well.

解决方案

The CPU architecture you describe is unfortunately too restricted to make this really clear with all the intermediate steps. Instead, I will write pseudo-C and pseudo-x86-assembler, hopefully in a way that is clear without being terribly familiar with C or x86.

The compiled JVM bytecode might look something like this:

ldc 0 # push first first constant (== 1)
ldc 1 # push the second constant (== 2)
iadd # pop two integers and push their sum
istore_0 # pop result and store in local variable

The interpreter has (a binary encoding of) these instructions in an array, and an index referring to the current instruction. It also has an array of constants, and a memory region used as stack and one for local variables. Then the interpreter loop looks like this:

while (true) {
    switch(instructions[pc]) {
    case LDC:
        sp += 1; // make space for constant
        stack[sp] = constants[instructions[pc+1]];
        pc += 2; // two-byte instruction
    case IADD:
        stack[sp-1] += stack[sp]; // add to first operand
        sp -= 1; // pop other operand
        pc += 1; // one-byte instruction
    case ISTORE_0:
        locals[0] = stack[sp];
        sp -= 1; // pop
        pc += 1; // one-byte instruction
    // ... other cases ...
    }
}

This C code is compiled into machine code and run. As you can see, it's highly dynamic: It inspects each bytecode instruction each time that instruction is executed, and all values goes through the stack (i.e. RAM).

While the actual addition itself probably happens in a register, the code surrounding the addition is rather different from what a Java-to-machine code compiler would emit. Here's an excerpt from what a C compiler might turn the above into (pseudo-x86):

.ldc:
incl %esi # increment the variable pc, first half of pc += 2;
movb %ecx, program(%esi) # load byte after instruction
movl %eax, constants(,%ebx,4) # load constant from pool
incl %edi # increment sp
movl %eax, stack(,%edi,4) # write constant onto stack
incl %esi # other half of pc += 2
jmp .EndOfSwitch

.addi
movl %eax, stack(,%edi,4) # load first operand
decl %edi # sp -= 1;
addl stack(,%edi,4), %eax # add
incl %esi # pc += 1;
jmp .EndOfSwitch

You can see that the operands for the addition come from memory instead of being hardcoded, even though for the purposes of the Java program they are constant. That's because for the interpreter, they are not constant. The interpreter is compiled once and then must be able to execute all sorts of programs, without generating specialized code.

The purpose of the JIT compiler is to do just that: Generate specialized code. A JIT can analyze the ways the stack is used to transfer data, the actual values of various constants in the program, and the sequence of calculations performed, to generate code that more efficiently does the same thing. In our example program, it would allocate the local variable 0 to a register, replace the access to the constant table with moving constants into registers (movl %eax, $1), and redirect the stack accesses to the right machine registers. Ignoring a few more optimizations (copy propagation, constant folding and dead code elimination) that would normally be done, it might end up with code like this:

movl %ebx, $1 # ldc 0
movl %ecx, $2 # ldc 1
movl %eax, %ebx # (1/2) addi
addl %eax, %ecx # (2/2) addi
# no istore_0, local variable 0 == %eax, so we're done

这篇关于解释器如何解释代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆