使用Parsec解析正则表达式 [英] Using Parsec to parse regular expressions
问题描述
我试图通过实现一个小的正则表达式分析器来学习Parsec。在BNF中,我的语法如下所示:
I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
我试图在Haskell中实现这个功能:
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
这里有一些无限循环(例如expr - > star - > expr without consumption任何令牌),这使解析器永远循环。我不确定如何解决它,因为 star 的本质是它在最后消耗了它的强制令牌。
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star
is that it consumes its mandatory token at the end.
有什么想法?
推荐答案
您应该使用 Parsec.Expr.buildExprParser
;它对于这个目的是理想的。您只需描述您的运算符,它们的优先级和关联性,以及如何解析原子,然后combinator会为您构建解析器!
You should use Parsec.Expr.buildExprParser
; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
您可能还想添加该能力将术语与parens分组,以便您可以将 *
应用于不止一个文字。
You probably also want to add the ability to group terms with parens so that you can apply *
to more than just a single literal.
这是我的尝试(我扔了 |
, +
和?
for好的措施):
Here's my attempt (I threw in |
, +
, and ?
for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
这篇关于使用Parsec解析正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!