如何手动构建 AST? [英] How to manually construct an AST?
问题描述
我目前正在学习解析,但我对如何生成 AST 有点困惑.我编写了一个解析器,可以正确验证表达式是否符合语法(当表达式符合时它是静默的,如果不符合则引发异常).我从哪里开始构建 AST?我找到了大量有关构建 LL(1) 解析器的信息,但关于构建 AST 的信息却很少.
I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.
我当前的代码(用非常简单的 Ruby 编写,包括词法分析器和解析器)可在 github 上找到:https://gist.github.com/e9d4081b7d3409e30a57
My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57
有人能解释一下我是如何从目前拥有的东西变成 AST 的吗?
Can someone explain how i go from what i have currently to an AST?
或者,如果您不熟悉 Ruby,但了解 C,您能告诉我我如何为 递归下降解析 维基百科文章.
Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.
请注意,我不想使用像 yacc 或 antlr 这样的解析器生成器来为我完成工作,我想从头开始做所有事情.
Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.
谢谢!
推荐答案
您需要将匹配的每个符号与构造树的那一小部分的回调相关联.例如,让我们采用一个相当常见的结构:嵌套函数调用.
You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.
a(b())
您的终端令牌类似于:
- L_PAREN = '('
- R_PAREN = ')'
- 标识符 = [a-z]+
你的非终结符类似于:
- FUNCTION_CALL = IDENTIFIER、L_PAREN、R_PAREN
- 或;
- FUNCTION_CALL = IDENTIFIER、L_PAREN、FUNCTION_CALL、R_PAREN
显然,上述规则 FUNCTION_CALL
的第二种选择是递归的.
Obviously the second alternative above for the rule FUNCTION_CALL
is recursive.
您已经有一个解析器,它知道它找到了一个有效的符号.您缺少的一点是将回调附加到规则,该规则接收其组件作为输入并返回一个值(通常)表示 AST 中的该节点.
You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.
想象一下,如果我们上面的 FUNCTION_CALL
规则中的第一个选择有一个回调:
Imagine if the first alternative from our FUNCTION_CALL
rule above had a callback:
Proc.new do |id_tok, l_paren_tok, r_paren_tok|
{ item: :function_call, name: id_tok, args: [] }
end
这意味着匹配产生的 AST:
That would mean that the AST resulting from matching:
a()
应该是:
{
item: :function_call,
name: "a",
args: []
}
现在将其推断为更复杂的 a(b())
.因为解析器是递归的,它会首先识别 b()
,回调从中返回我们上面的内容,但使用b"而不是a".
Now to extrapolate that to the more complex a(b())
. Because the parser is recursive, it will recognize the b()
first, the callback from which returns what we have above, but with "b" instead of "a".
现在让我们定义附加到与第二个选项匹配的规则的回调.非常相似,除了它还处理传递的参数:
Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:
Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
{ item: :function_call, name: id_tok, args: [ func_call_item ] }
end
因为解析器已经识别了 b()
并且 AST 的那部分是从你的回调中返回的,所以结果树现在是:
Because the parser has already recognized b()
and that part of the AST was returned from your callback, the resulting tree is now:
{
item: :function_call,
name: "a",
args: [
{
item: :function_call,
name: "b",
args: []
}
]
}
希望这能让您深思熟虑.将您匹配的所有标记传递到构建 AST 非常小的部分的例程中.
Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.
这篇关于如何手动构建 AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!