如何在Haskell中编写Lisp解析器? [英] How to write a lisp parser in Haskell?

查看:48
本文介绍了如何在Haskell中编写Lisp解析器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图在Haskell中编写一个lisp解释器,受到Norvig在Python中的启发( http://norvig.com/lispy.html ).我有一个成功的令牌生成器,可以根据需要链接到该令牌生成器.在这里,它会输出正确的代码,直到Norvig的Python令牌生成器为止.

I'm trying to write a lisp interpreter in haskell, inspired by Norvig's in Python (http://norvig.com/lispy.html). I have a successful tokenizer which I can link to if need be. Here it outputs the correct code up to Norvig's Python tokenizer.

program = "(begin (define r 10) (* pi (* r r)))"
astTokenized = tokenize program
astTokenized == ["(","begin","(","define","r","10",")","(","*","pi","(","*","r","r",")",")",")"]

在这里,我定义了我的抽象语法树数据类型,尽管我知道它已经存在一些隐式错误,因为它没有包装在列表中.

Here I define my abstract syntax tree data type, although I know that it already has some implicit error as it isn't wrapped in a list.

data Ast x = Val x | Node [Ast x] deriving (Show)

这是我的第一次尝试:

parse :: [[Char]] -> [Ast [Char]]
parse (x:xs)
  | x == "(" = [Node (parse xs)]
  | x == ")" = []
  | otherwise = (Val x) : parse xs

希望如此,除了它在第一个')'之后终止.

Hopeful, except for it terminates after the first ')'.

Prelude> parse astTokenized
[Node [Val "begin",Node [Val "define",Val "r",Val "10"]]]

在这里,我更改了[]的基本大小写,并调整了')'的条件,以便它将解析整个输入终止,但是现在它只创建了一个更深的树,因此无法正确分支.

Here I change the base case for [], and adjust the condition for ')' so it will parse the entire input terminate, but it now just creates a deeper tree, therefore failing to branch correctly.

parse [] = []
parse (x:xs)
  | x == "(" = [Node (parse xs)]
  | x == ")" = parse xs
  | otherwise = (Val x) : parse xs

Prelude> parse astTokenized
[Node [Val "begin",Node [Val "define",Val "r",Val "10",Node [Val "*",Val "pi",Node [Val "*",Val "r",Val "r"]]]]]

以某种方式,它需要允许并行"的树,而不仅仅是嵌套的树.任何帮助将不胜感激.

In some way it needs to allow for "parallel" trees, not merely nested. Any help would be appreciated.

推荐答案

问题是家庭作业/运动,下面是一个解释,试图避免直接放弃解决方案;它是通过用Common Lisp语言编写的(因为我不知道Haskell ;-)).

在Common Lisp中,有一个名为 READ-DELIMITED-LIST ,它以列表形式读取多种形式,直到读者到达结尾字符为止.当遇到开括号时,读者会抓住所有形式,直到闭括号为止,然后从那里继续进行解析.区别在于它适用于字符流,而不适用于标记,并且该流用于副作用.

In Common Lisp there is an intermediate function called READ-DELIMITED-LIST, which reads multiple forms as a list until the reader reaches an ending character. When encountering an opening parenthesis, the reader grabs all forms until the closing parenthesis, and then continue parsing from there. The difference is that it works on streams of characters, not on tokens, and that the stream is used for side-effects.

在一种纯粹的功能性方法中(例如在您的代码中),parse函数需要返回其余要处理的令牌以及返回的AST.这样,您就可以在解析过程中消耗所需数量的令牌,并允许调用方从解析器结束的地方继续解析.

In a purely functional approach, like in your code, the parse function needs to return the remaining tokens to be processed, along with the returned AST. This allows you to consume as many tokens as you need during parsing, and allows the caller to continue parsing from where the parser ended.

换句话说,当您关闭括号时,必须将 xs 返回到调用上下文.因此,您将携带一个累加器对象(状态)以及您的代码.听说monad可以帮助您完成样板工作.

In other words, when you close a parenthesis, you have to return xs to the calling context. And so, you carry an accumulator object (the state) along with your code. I heard that monads can help you with the boilerplate.

  • 多值绑定 VALUES 一起工作:(values xy)返回多个值,并且 multiple-value-bind 从表达式中捕获多个值,并将每个值绑定到一个变量.

  • MULTIPLE-VALUE-BIND and VALUES work together: (values x y) returns multiple values and multiple-value-bind captures multiple values from an expression, and bind each of them to a variable.

DESTRUCTURING-BIND 是模式匹配的祖先,只需将列表分解为组件即可.

DESTRUCTURING-BIND is the ancestor of pattern matching, and simply destructures a list into components.

非特殊形式(f x1 .. x2)是函数应用程序

A non-special form (f x1 .. x2) is function application

(case test.子句)是一个开关,其中每个子句都是一个以文字值(或此类值的列表)开头,后跟表达式的列表. t 否则是始终匹配的特殊关键字(默认情况下).

(case test . clauses) is a switch, where each clause is a list starting with a literal value (or a list of such values) and followed by expressions. t and otherwise are special keywords that always match (the default case).

其余的应该不言自明(否则询问).

The rest should be pretty self-explanatory (ask otherwise).

请注意,我使用符号< > 分别表示 opening closing 标记,避免与Lisp的括号引起混淆.例如,列表(< abc< d>>)包含8个标记,并且可以是''((abc(d))''的内部表示形式.代码>.我可以使用 left-paren right-paren ,甚至是复杂的数据结构,这只是一个内部表示.此处没有详细的词法分析器.

Notice that I use symbols < and > to represent respectively the opening and closing tokens, to avoid any confusion with Lisp's parentheses. For example, the list (< a b c < d > >) contains 8 tokens, and could be the internal representation of "(a b c (d))". I could have use left-paren and right-paren, or even complex data-structures, this is just an internal representation. The lexer is not detailed here.

入口点 parse 函数获取令牌列表,并将解析后的值和其余令牌作为辅助值返回;它取决于 parse-until-close (定义如下):

The entry point parse function takes a list of tokens and returns the parsed value and the remaining tokens as a secondary value; it depends on parse-until-close, defined below:

(defun parse (tokens)
  (when tokens
    (destructuring-bind (head . tail) tokens
      (case head
        (> (error "unmatched closing parenthesis"))
        (< (parse-until-close tail))
        (otherwise (values head tail))))))

然后 parse-until-close ,该函数递归地解析令牌,直到找到关闭令牌为止.请注意,令牌在不同点重新绑定:

Then, parse-until-close, a function that recursively parse tokens until it finds a close token; note that tokens is re-bound at different points:

(defun parse-until-close (tokens)
  (when tokens
    (case (first tokens)
      (> (values nil (rest tokens)))
      (otherwise
       ;; first read the element in head of tokens
       (multiple-value-bind (head tokens) (parse tokens)
         ;; then recurse to read the remaining items in list
         (multiple-value-bind (tail tokens) (parse-until-close tokens)
           (values (cons head tail) tokens)))))))

以上内容以递归方式解析令牌并构建一个列表.如果我们的令牌列表以关闭令牌> 开头,我们将返回空列表以及其余令牌.

The above recursively parse tokens and build a list. If our list of tokens starts with >, the close token, we return the empty list as well as the rest of the tokens.

否则,我们解析一个元素并使用 parse-until-close 递归.

Otherwise, we parse one element and recurse with parse-until-close.

每个调用返回两个值,即已解析的令牌和其余值:

Each call returns two values, the parsed token and the remaining ones:

(parse '(one token))
=> ONE
   (TOKEN)

(parse '(< abc < x > y >))
=> (ABC (X) Y)
   NIL

(parse '(< abc def >))
=> (ABC DEF)
   NIL

;; incomplete input
(parse '(< < < abc))
=> (((ABC)))
   NIL

这篇关于如何在Haskell中编写Lisp解析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆