构建解析器(第一部分) [英] Building a parser (Part I)

查看:104
本文介绍了构建解析器(第一部分)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发自己的基于javascript的编程语言(是的,这很疯狂,但这仅是为了学习... 也许吗?).好吧,我正在阅读有关解析器的信息,第一步是将代码源转换为令牌,例如:

I'm making my own javascript-based programming language (yeah, it is crazy, but it's for learn only... maybe?). Well, I'm reading about parsers and the first pass is to convert the code source to tokens, like:

if(x > 5)
  return true;

令牌生成器:

T_IF          "if"
T_LPAREN      "("
T_IDENTIFIER  "x"
T_GT          ">"
T_NUMBER      "5"
T_RPAREN      ")"
T_IDENTIFIER  "return"
T_TRUE        "true"
T_TERMINATOR  ";"

我暂时不知道自己的逻辑是否正确.在我的解析器上,它甚至更好(或不是?)并转换为它(是的,多维数组):

I don't know if my logic is correct for that for while. On my parser it is even better (or not?) and translate to it (yeah, multidimensional array):

T_IF             "if"
  T_EXPRESSION     ...
    T_IDENTIFIER     "x"
    T_GT             ">"
    T_NUMBER         "5"
  T_CLOSURE        ...
    T_IDENTIFIER     "return"
    T_TRUE           "true"

我有一些疑问:

  1. 我的方式好于还是原始方式?请注意,我的代码将被读取和编译(翻译成另一种语言,例如PHP),而不是一直解释.
  2. 在令牌生成器之后,我需要做什么呢?我真的迷失了这个通行证!
  3. 有一些很好的教程来学习我该怎么做?
  1. Is my way better or worse that the original way? Note that my code will be read and compiled (translated to another language, like PHP), instead of interpreted all the time.
  2. After I tokenizer, what I need do exactly? I'm really lost on this pass!
  3. There are some good tutorial to learn how I can do it?

是的.再见!

推荐答案

通常,您希望将令牌生成器(也称为 lexer )的功能与编译器或解释器的其他阶段分开.这样做的原因是基本的模块化:每次通过都消耗一种东西(例如字符),而又产生另一种东西(例如令牌).

Generally, you want to separate the functions of the tokeniser (also called a lexer) from other stages of your compiler or interpreter. The reason for this is basic modularity: each pass consumes one kind of thing (e.g., characters) and produces another one (e.g., tokens).

因此,您已将字符转换为令牌.现在,您想将标记的平面列表转换为有意义的嵌套表达式,这就是通常所说的 parsing .对于类似JavaScript的语言,您应该查看递归下降解析.对于使用具有不同优先级的中缀运算符来解析表达式,普拉特解析非常有用,您可以在特殊情况下,可以使用普通递归下降解析.

So you’ve converted your characters to tokens. Now you want to convert your flat list of tokens to meaningful nested expressions, and this is what is conventionally called parsing. For a JavaScript-like language, you should look into recursive descent parsing. For parsing expressions with infix operators of different precedence levels, Pratt parsing is very useful, and you can fall back on ordinary recursive descent parsing for special cases.

仅根据您的情况为您提供一个更具体的示例,我假设您可以编写两个函数:accept(token)expect(token),这两个函数将测试您创建的流中的下一个标记.您将为您的语言语法中的每种类型的语句或表达式创建函数.例如,下面是statement()函数的Python伪代码:

Just to give you a more concrete example based on your case, I’ll assume you can write two functions: accept(token) and expect(token), which test the next token in the stream you’ve created. You’ll make a function for each type of statement or expression in the grammar of your language. Here’s Pythonish pseudocode for a statement() function, for instance:

def statement():

  if accept("if"):
    x = expression()
    y = statement()
    return IfStatement(x, y)

  elif accept("return"):
    x = expression()
    return ReturnStatement(x)

  elif accept("{")
    xs = []
    while True:
      xs.append(statement())
      if not accept(";"):
        break
    expect("}")
    return Block(xs)

  else:
    error("Invalid statement!")

这为您提供了程序的所谓抽象语法树(AST),然后您可以对其进行操作(优化和分析),输出(编译)或运行(解释).

This gives you what’s called an abstract syntax tree (AST) of your program, which you can then manipulate (optimisation and analysis), output (compilation), or run (interpretation).

这篇关于构建解析器(第一部分)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆