将BNF翻译成Parsec程序有什么技巧吗? [英] Is there any trick about translating BNF to Parsec program?
问题描述
匹配函数调用链的BNF(如 x(y)(z)...
):
The BNF that match function call chain (like x(y)(z)...
):
expr = term T
T = (expr) T
| EMPTY
term = (expr)
| VAR
将它翻译成看起来很棘手的Parsec程序。
Translate it to Parsec program that looks so tricky.
term :: Parser Term
term = parens expr <|> var
expr :: Parser Term
expr = do whiteSpace
e <- term
maybeAddSuffix e
where addSuffix e0 = do e1 <- parens expr
maybeAddSuffix $ TermApp e0 e1
maybeAddSuffix e = addSuffix e
<|> return e
您能列出所有关于将BNF翻译为Parsec程序的设计模式吗?
Could you list all the design patterns about translating BNF to Parsec program?
推荐答案
最简单的想法是,如果你的语法很大,你可以使用Alex / Happy组合。这是相当直接的使用,直接接受BNF格式 - 不需要人工翻译,也许最重要的是,产生快速的解析器/词法分析器。
The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.
如果你是用parsec来做的(或者你是做一个学习练习的话),我觉得一般来说比较容易做到两个阶段。第一个词法,然后解析。
If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!
First write the appropriate types:
{-# LANGUAGE LambdaCase #-}
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char
import Control.Applicative hiding ((<|>))
import Control.Monad
data Term = App Term Term | Var String deriving (Show, Eq)
data Token = LParen | RParen | Str String deriving (Show, Eq)
type Lexer = Parsec [Char] () -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token
解析单个令牌很简单。为了简单起见,变量是1个或多个字母。
Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.
oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|>
(char ')' >> return RParen) <|>
(Str <$> many1 letter)
解析整个令牌流多次解析单个令牌,可能由空格分隔:
Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:
lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces)
注意 code>:这种方式,在字符串的开头和结尾接受空格。
Note the placement of spaces
: this way, white space is accepted at the beginning and end of the string.
由于 Parser
使用自定义标记类型,因此必须使用自定义 / code>函数。幸运的是,这几乎与现有的满足相同。
Since Parser
uses a custom token type, you have to use a custom satisfy
function. Fortunately, this is almost identical to the existing satisfy.
satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show
(\src _ _ -> incSourceColumn src 1)
(\x -> if f x then Just x else Nothing)
然后我们可以为每个原始令牌编写解析器。
Then we can write parsers for each of the primitive tokens.
lparen = satisfy' $ \case { LParen -> True ; _ -> False }
rparen = satisfy' $ \case { RParen -> True ; _ -> False }
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })
你可以想象, parens
对我们的目的是有用的。这是很直接写的。
As you may imagine, parens
would be useful for our purposes. It is very straightforward to write.
parens :: Parser a -> Parser a
parens = between lparen rparen
现在有趣的部分。
term, expr, var :: Parser Term
term = parens expr <|> var
var = Var <$> strTok
这两个对你来说应该是很明显的。
These two should be fairly obvious to you.
Parec包含组合器选项
和 optionMaybe
,当您需要可能做某事时,它们非常有用。
Parec contains combinators option
and optionMaybe
which are useful when you you need to "maybe do something".
expr = do
e0 <- term
option e0 (parens expr >>= \e1 -> return (App e0 e1))
最后一行表示 - 解析器给予选项
- 如果失败,则返回 e0
。
The last line means - try to apply the parser given to option
- if it fails, instead return e0
.
对于测试,您可以:
tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""
/ code>附加到每个解析器是确保整个输入被消耗;如果有额外的结尾字符,则字符串不能是语法的成员。注意 - 您的示例
x(y)(z)
实际上不是您的语法!
The eof
attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z)
is not actually in your grammar!
>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input
但以下是
>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))
这篇关于将BNF翻译成Parsec程序有什么技巧吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!