在字节码编译器中计算跳转地址的智能解决方案? [英] Intelligent solution to computing jump addresses in a bytecode compiler?
问题描述
假设我正在实现一个字节码编译器,类似于Lua / Python的...等等。
Let's say I'm implementing a bytecode compiler, similar to Lua's/Python's... so on.
我遍历AST,生成字节码指令,我在里面的
if-else
里面碰到 break
code> loop:
I'm traversing the AST, generating bytecode instructions, and I run into a break
inside of an if-else
inside of a while
loop:
while (cond1)
if (cond2)
...
else
break
(我试着写出等效的字节码,看起来太有用了。)
(I tried writing out equivalent bytecode but it didn't look too helpful.)
关键是,在这个例子中至少有4个跳转指令,而I
想找到一个优雅的解决方案在编译AST时填充跳转地址...
我不知道的跳转地址,而
循环或
The point is, there are at least 4 "jump" instructions in that example, and I
want to find an elegant solution to filling in the jump addresses as I compile the AST...
I don't know the jump address for the while
loop or the break
until after I've fully
"compiled" the inner statements.
- 一个伪代码
- 一个解决方案不应该取决于我是否实现了一个基于寄存器或基于堆栈的字节码编译器(我在玩两者)
我还没读过这本龙书。
如果我递归编译AST,当我在某些任意数目的循环和 if-else
块中达到 break
语句时,编译器知道要跳到哪个空标签?我猜在某种类型的标签名栈外部的递归AST走功能。
If I'm recursively compiling the AST, when I reach a break
statement inside of some arbitrary number of loops and if-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
推荐答案
backpatching:为正向跳转填充一个虚拟值,为语句的主体生成代码,然后返回并使用结尾处的真实值替换虚拟值。
The principle you need is called "backpatching": fill in a dummy value for the forward jump, generate the code for the body of the statement, and then go back and replace the dummy value with the real one at the end.
例如
# start of codegen for 'while'
L1:
[evaluate cond1]
jne L2 # forward reference: use a dummy value for now
# start of codegen for 'if ... else'
L3:
[evaluate cond2]
jne L4 # another forward reference: use a dummy value for now
[code in 'if' branch]
j L5 # another forward reference: use a dummy value for now
L4:
[code in 'else' branch before 'break']
j L2
[code in 'else' branch after 'break']
L5: # end of 'if ... else'
# now go back and fill in references to L4, L5 inside the 'if ... else' block
# end of codegen for 'if ... else'
# codegen for 'while' continues...
j L1 # loop
L2: # end of 'while' loop
# now go back and fill in references to L2 inside the 'while' block
# end of codegen for 'while'
关于您的编辑:
Regarding your edit:
如果我递归编译AST, c $ c> break 语句里面的某些任意数量的循环和
if-else
块,编译器应该如何知道要跳转到哪个空标签?我猜一些类型的标签名栈外部的递归AST走功能。
If I'm recursively compiling the AST, when I reach a
break
statement inside of some arbitrary number of loops andif-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
实现 break
语句的跳转目标是最内圈的循环;
The target of the jump which implements the break
statement is the end of the innermost enclosing loop; yes, you could track that with an explicit external stack.
但是,如果你有一个递归AST行走函数,你已经有一个隐式 stack - 调用的递归函数调用的框架 - 所以你可能不需要一个显式的。
But, if you have a recursive AST walking function, you already have an implicit stack - the call frames of the recursive function invocations - so you probably don't need an explicit one as well.
例如
...
while (outer)
...
if (outer_done)
break
...
while (inner)
...
if (inner_done)
break
...
[break from inner 'while' ends up here]
...
if (outer_done_2)
break
...
[break from outer 'while' ends up here]
...
整个内部 while
循环的代码生成发生在外部的正文的AST的递归遍历中,而循环。在这个递归调用中,你只需要关心内部循环,而不是外部循环。
the entirety of the code generation for the inner while
loop happens within a recursive walk of the AST for the body of the outer while
loop. Inside this recursive call, you only need to care about the inner loop, not the outer one.
所以,你需要做的是:
- 在开始处理
时保存任何当前的反馈状态,而
- 开始一个新的空列表,用于跟踪出现在
正文中的
break
,而 - 处理正文(可能导致递归调用)
- 应用背面
- ,然后在退出之前恢复先前的状态。
- save any current backpatching state at the start of processing a
while
- start a new empty list for tracking any
break
that appears within the body of thiswhile
- handle the body (which may result in a recursive call)
- apply the backpatches
- then restore the previous state before exiting.
例如有点像这样:
codegen for while:
save previous break backpatch list
initialise break backpatch list as empty
perform codegen for evaluating condition
perform codegen for body statements
apply backpatches
restore previous break backpatch list
当前 break
backpatch列表必须是传递给所有codegen函数的一些状态的一部分 break
需要附加到它)。但是保存的列表可以作为和
codegen函数的局部变量来跟踪。
The current break
backpatch list must be part of some state which is passed to all the codegen functions (the codegen for break
needs to append to it). But the saved list could just be tracked as a local variable of the while
codegen function.
这篇关于在字节码编译器中计算跳转地址的智能解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!