如何手动构造AST? [英] How to manually construct an AST?

查看:122
本文介绍了如何手动构造AST?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在学习解析,但是对于如何生成AST感到有些困惑.我编写了一个解析器,可以正确地验证表达式是否符合语法(当表达式符合时它是静默的,而当表达式不符合时它会引发异常).我要从哪里去建立AST?我发现了很多有关构建LL(1)解析器的信息,但随后构建AST的信息很少.

I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.

我的当前代码(使用非常简单的Ruby编写,包括一个词法分析器和解析器)可以在github上找到: https://gist.github.com/e9d4081b7d3409e30a57

My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57

有人可以解释我如何从目前的状态转到AST吗?

Can someone explain how i go from what i have currently to an AST?

或者,如果您不熟悉Ruby,但了解C,您能告诉我我如何在递归下降解析维基百科文章.

Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.

请注意,我不想使用像yacc或antlr这样的解析器生成器为我完成工作,我想从头开始做所有事情.

Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.

谢谢!

推荐答案

您需要将匹配的每个符号与构造树的那一部分的回调相关联.例如,让我们采用一个相当普通的结构:嵌套函数调用.

You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.

a(b())

您的终端令牌类似于:

  • L_PAREN ='('
  • R_PAREN =')'
  • IDENTIFIER = [a-z] +

您的非终结符如下:

  • FUNCTION_CALL = IDENTIFIER,L_PAREN,R_PAREN
  • 或;
  • FUNCTION_CALL = IDENTIFIER,L_PAREN,FUNCTION_CALL,R_PAREN

显然,规则FUNCTION_CALL的上述第二种选择是递归的.

Obviously the second alternative above for the rule FUNCTION_CALL is recursive.

您已经拥有一个解析器,该解析器知道已找到有效的符号.您缺少的一点是将回调附加到规则,该规则将其组件作为输入接收并返回一个值(通常),该值表示AST中的该节点.

You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.

想象一下,我们上面的FUNCTION_CALL规则中的第一个替代方案是否有回调:

Imagine if the first alternative from our FUNCTION_CALL rule above had a callback:

Proc.new do |id_tok, l_paren_tok, r_paren_tok|
  { item: :function_call, name: id_tok, args: [] }
end

这意味着AST是由匹配产生的:

That would mean that the AST resulting from matching:

a()

将是:

{
  item: :function_call,
  name: "a",
  args: []
}

现在将其推断为更复杂的a(b()).因为解析器是递归的,所以它将首先识别b(),该回调从中返回上面的内容,但用"b"代替"a".

Now to extrapolate that to the more complex a(b()). Because the parser is recursive, it will recognize the b() first, the callback from which returns what we have above, but with "b" instead of "a".

现在,让我们定义与第二种选择匹配的规则所附加的回调.这非常相似,除了它还处理它所传递的参数:

Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:

Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
  { item: :function_call, name: id_tok, args: [ func_call_item ] }
end

因为解析器已经识别出b(),并且AST的一部分已从您的回调返回,所以现在的结果树为:

Because the parser has already recognized b() and that part of the AST was returned from your callback, the resulting tree is now:

{
  item: :function_call,
  name: "a",
  args: [
    {
      item: :function_call,
      name: "b",
      args: []
    }
  ]
}

希望这能给您带来一些思考.将您匹配的所有令牌传递到一个构造AST很小部分的例程中.

Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.

这篇关于如何手动构造AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆