如何手动构造AST? [英] How to manually construct an AST?
问题描述
我目前正在学习解析,但是对于如何生成AST感到有些困惑.我编写了一个解析器,可以正确地验证表达式是否符合语法(当表达式符合时它是静默的,而当表达式不符合时它会引发异常).我要从哪里去建立AST?我发现了很多有关构建LL(1)解析器的信息,但随后构建AST的信息很少.
I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.
我的当前代码(使用非常简单的Ruby编写,包括一个词法分析器和解析器)可以在github上找到: https://gist.github.com/e9d4081b7d3409e30a57
My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57
有人可以解释我如何从目前的状态转到AST吗?
Can someone explain how i go from what i have currently to an AST?
或者,如果您不熟悉Ruby,但了解C,您能告诉我我如何在递归下降解析维基百科文章.
Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.
请注意,我不想使用像yacc或antlr这样的解析器生成器为我完成工作,我想从头开始做所有事情.
Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.
谢谢!
推荐答案
您需要将匹配的每个符号与构造树的那一部分的回调相关联.例如,让我们采用一个相当普通的结构:嵌套函数调用.
You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.
a(b())
您的终端令牌类似于:
- L_PAREN ='('
- R_PAREN =')'
- IDENTIFIER = [a-z] +
您的非终结符如下:
- FUNCTION_CALL = IDENTIFIER,L_PAREN,R_PAREN
- 或;
- FUNCTION_CALL = IDENTIFIER,L_PAREN,FUNCTION_CALL,R_PAREN
显然,规则FUNCTION_CALL
的上述第二种选择是递归的.
Obviously the second alternative above for the rule FUNCTION_CALL
is recursive.
您已经拥有一个解析器,该解析器知道已找到有效的符号.您缺少的一点是将回调附加到规则,该规则将其组件作为输入接收并返回一个值(通常),该值表示AST中的该节点.
You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.
想象一下,我们上面的FUNCTION_CALL
规则中的第一个替代方案是否有回调:
Imagine if the first alternative from our FUNCTION_CALL
rule above had a callback:
Proc.new do |id_tok, l_paren_tok, r_paren_tok|
{ item: :function_call, name: id_tok, args: [] }
end
这意味着AST是由匹配产生的:
That would mean that the AST resulting from matching:
a()
将是:
{
item: :function_call,
name: "a",
args: []
}
现在将其推断为更复杂的a(b())
.因为解析器是递归的,所以它将首先识别b()
,该回调从中返回上面的内容,但用"b"代替"a".
Now to extrapolate that to the more complex a(b())
. Because the parser is recursive, it will recognize the b()
first, the callback from which returns what we have above, but with "b" instead of "a".
现在,让我们定义与第二种选择匹配的规则所附加的回调.这非常相似,除了它还处理它所传递的参数:
Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:
Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
{ item: :function_call, name: id_tok, args: [ func_call_item ] }
end
因为解析器已经识别出b()
,并且AST的一部分已从您的回调返回,所以现在的结果树为:
Because the parser has already recognized b()
and that part of the AST was returned from your callback, the resulting tree is now:
{
item: :function_call,
name: "a",
args: [
{
item: :function_call,
name: "b",
args: []
}
]
}
希望这能给您带来一些思考.将您匹配的所有令牌传递到一个构造AST很小部分的例程中.
Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.
这篇关于如何手动构造AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!