使用Parsec解析正则表达式 [英] Using Parsec to parse regular expressions

查看:201
本文介绍了使用Parsec解析正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过实现一个小的正则表达式分析器来学习Parsec。在BNF中,我的语法如下所示:

I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:

EXP  : EXP *
     | LIT EXP
     | LIT

我试图在Haskell中实现这个功能:

I've tried to implement this in Haskell as:

expr = try star
       <|> try litE
       <|> lit

litE  = do c <- noneOf "*"
           rest <- expr
           return (c : rest)

lit   = do c <- noneOf "*"
           return [c]

star = do content <- expr
          char '*'
          return (content ++ "*")

这里有一些无限循环(例如expr - > star - > expr without consumption任何令牌),这使解析器永远循环。我不确定如何解决它,因为 star 的本质是它在最后消耗了它的强制令牌。

There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.

有什么想法?

推荐答案

您应该使用 Parsec.Expr.buildExprParser ;它对于这个目的是理想的。您只需描述您的运算符,它们的优先级和关联性,以及如何解析原子,然后combinator会为您构建解析器!

You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!

您可能还想添加该能力将术语与parens分组,以便您可以将 * 应用于不止一个文字。

You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.

这是我的尝试(我扔了 | + for好的措施):

Here's my attempt (I threw in |, +, and ? for good measure):

import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr

data Term = Literal Char
          | Sequence [Term]
          | Repeat (Int, Maybe Int) Term
          | Choice [Term]
  deriving ( Show )

term :: Parser Term
term = buildExpressionParser ops atom where

  ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
          , Postfix (Repeat (1, Nothing) <$ char '+')
          , Postfix (Repeat (0, Just 1)  <$ char '?')
          ]
        , [ Infix (return sequence) AssocRight
          ]
        , [ Infix (choice <$ char '|') AssocRight
          ]
        ]

  atom = msum [ Literal <$> lit
              , parens term
              ]

  lit = noneOf "*+?|()"
  sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
  choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
  parens = between (char '(') (char ')')

  seqTerms (Sequence ts) = ts
  seqTerms t = [t]

  choiceTerms (Choice ts) = ts
  choiceTerms t = [t]

main = parseTest term "he(llo)*|wor+ld?"

这篇关于使用Parsec解析正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆