将BNF翻译成Parsec程序有什么技巧吗? [英] Is there any trick about translating BNF to Parsec program?

查看:156
本文介绍了将BNF翻译成Parsec程序有什么技巧吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

匹配函数调用链的BNF(如 x(y)(z)... ):

The BNF that match function call chain (like x(y)(z)...):

expr = term T
T    = (expr) T
      | EMPTY
term = (expr)
      | VAR 

将它翻译成看起来很棘手的Parsec程序。

Translate it to Parsec program that looks so tricky.

term :: Parser Term
term = parens expr <|> var

expr :: Parser Term
expr = do whiteSpace
          e <- term
          maybeAddSuffix e
  where addSuffix e0 = do e1 <- parens expr
                          maybeAddSuffix $ TermApp e0 e1
        maybeAddSuffix e = addSuffix e
                           <|> return e

您能列出所有关于将BNF翻译为Parsec程序的设计模式吗?

Could you list all the design patterns about translating BNF to Parsec program?

推荐答案

最简单的想法是,如果你的语法很大,你可以使用Alex / Happy组合。这是相当直接的使用,直接接受BNF格式 - 不需要人工翻译,也许最重要的是,产生快速的解析器/词法分析器。

The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.

如果你是用parsec来做的(或者你是做一个学习练习的话),我觉得一般来说比较容易做到两个阶段。第一个词法,然后解析。

If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!

First write the appropriate types:
{-# LANGUAGE LambdaCase #-}

import Text.Parsec 
import Text.Parsec.Combinator 
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char 
import Control.Applicative hiding ((<|>))
import Control.Monad 

data Term = App Term Term | Var String deriving (Show, Eq)

data Token = LParen | RParen | Str String deriving (Show, Eq)

type Lexer = Parsec [Char] ()   -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token

解析单个令牌很简单。为了简单起见,变量是1个或多个字母。

Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.

oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|> 
           (char ')' >> return RParen) <|>
           (Str <$> many1 letter)

解析整个令牌流多次解析单个令牌,可能由空格分隔:

Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:

lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces) 

注意 code>:这种方式,在字符串的开头和结尾接受空格。

Note the placement of spaces: this way, white space is accepted at the beginning and end of the string.

由于 Parser 使用自定义标记类型,因此必须使用自定义 / code>函数。幸运的是,这几乎与现有的满足相同。

Since Parser uses a custom token type, you have to use a custom satisfy function. Fortunately, this is almost identical to the existing satisfy.

satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show 
                       (\src _ _ -> incSourceColumn src 1) 
                       (\x -> if f x then Just x else Nothing)

然后我们可以为每个原始令牌编写解析器。

Then we can write parsers for each of the primitive tokens.

lparen = satisfy' $ \case { LParen -> True ; _ -> False } 
rparen = satisfy' $ \case { RParen -> True ; _ -> False } 
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })

你可以想象, parens 对我们的目的是有用的。这是很直接写的。

As you may imagine, parens would be useful for our purposes. It is very straightforward to write.

parens :: Parser a -> Parser a 
parens = between lparen rparen 

现在有趣的部分。

term, expr, var :: Parser Term

term = parens expr <|> var

var = Var <$> strTok 

这两个对你来说应该是很明显的。

These two should be fairly obvious to you.

Parec包含组合器选项 optionMaybe ,当您需要可能做某事时,它们非常有用。

Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something".

expr = do 
  e0 <- term 
  option e0 (parens expr >>= \e1 -> return (App e0 e1))

最后一行表示 - 解析器给予选项 - 如果失败,则返回 e0

The last line means - try to apply the parser given to option - if it fails, instead return e0.

对于测试,您可以:

tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""

/ code>附加到每个解析器是确保整个输入被消耗;如果有额外的结尾字符,则字符串不能是语法的成员。注意 - 您的示例 x(y)(z)实际上不是您的语法!

The eof attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z) is not actually in your grammar!

>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input

但以下是

>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

这篇关于将BNF翻译成Parsec程序有什么技巧吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆