如何手动构建 AST? [英] How to manually construct an AST?

查看:98
本文介绍了如何手动构建 AST?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在学习解析,但我对如何生成 AST 有点困惑.我编写了一个解析器,可以正确验证表达式是否符合语法(当表达式符合时它是静默的,如果不符合则引发异常).我从哪里开始构建 AST?我找到了大量有关构建 LL(1) 解析器的信息,但关于构建 AST 的信息却很少.

I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.

我当前的代码(用非常简单的 Ruby 编写,包括词法分析器和解析器)可在 github 上找到:https://gist.github.com/e9d4081b7d3409e30a57

My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57

有人能解释一下我是如何从目前拥有的东西变成 AST 的吗?

Can someone explain how i go from what i have currently to an AST?

或者,如果您不熟悉 Ruby,但了解 C,您能告诉我我如何为 递归下降解析 维基百科文章.

Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.

请注意,我不想使用像 yacc 或 antlr 这样的解析器生成器来为我完成工作,我想从头开始做所有事情.

Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.

谢谢!

推荐答案

您需要将匹配的每个符号与构造树的那一小部分的回调相关联.例如,让我们采用一个相当常见的结构:嵌套函数调用.

You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.

a(b())

您的终端令牌类似于:

  • L_PAREN = '('
  • R_PAREN = ')'
  • 标识符 = [a-z]+

你的非终结符类似于:

  • FUNCTION_CALL = IDENTIFIER、L_PAREN、R_PAREN
  • 或;
  • FUNCTION_CALL = IDENTIFIER、L_PAREN、FUNCTION_CALL、R_PAREN

显然,上述规则 FUNCTION_CALL 的第二种选择是递归的.

Obviously the second alternative above for the rule FUNCTION_CALL is recursive.

您已经有一个解析器,它知道它找到了一个有效的符号.您缺少的一点是将回调附加到规则,该规则接收其组件作为输入并返回一个值(通常)表示 AST 中的该节点.

You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.

想象一下,如果我们上面的 FUNCTION_CALL 规则中的第一个选择有一个回调:

Imagine if the first alternative from our FUNCTION_CALL rule above had a callback:

Proc.new do |id_tok, l_paren_tok, r_paren_tok|
  { item: :function_call, name: id_tok, args: [] }
end

这意味着匹配产生的 AST:

That would mean that the AST resulting from matching:

a()

应该是:

{
  item: :function_call,
  name: "a",
  args: []
}

现在将其推断为更复杂的 a(b()).因为解析器是递归的,它会首先识别 b(),回调从中返回我们上面的内容,但使用b"而不是a".

Now to extrapolate that to the more complex a(b()). Because the parser is recursive, it will recognize the b() first, the callback from which returns what we have above, but with "b" instead of "a".

现在让我们定义附加到与第二个选项匹配的规则的回调.非常相似,除了它还处理传递的参数:

Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:

Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
  { item: :function_call, name: id_tok, args: [ func_call_item ] }
end

因为解析器已经识别了 b() 并且 AST 的那部分是从你的回调中返回的,所以结果树现在是:

Because the parser has already recognized b() and that part of the AST was returned from your callback, the resulting tree is now:

{
  item: :function_call,
  name: "a",
  args: [
    {
      item: :function_call,
      name: "b",
      args: []
    }
  ]
}

希望这能让您深思熟虑.将您匹配的所有标记传递到构建 AST 非常小的部分的例程中.

Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.

这篇关于如何手动构建 AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆