如何让Attoparsec解析器在不消耗的情况下成功(如parsec lookAhead) [英] How do I make Attoparsec parser succeed without consuming (like parsec lookAhead)

查看:153
本文介绍了如何让Attoparsec解析器在不消耗的情况下成功(如parsec lookAhead)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个快速的attoparsec解析器来遍历一个aspx文件并删除所有的样式属性,并且它的工作正常,除了其中的一部分,我无法弄清楚如何使它匹配>



以下是我的:

  anyTill = manyTill anyChar 
anyBetween start end = start *> anyTill结束

styleWithQuotes = anyBetween(stringCIstyle = \)(stringCI\)
styleWithoutQuotes = anyBetween(stringCIstyle =)(stringCI< ; |>>)
everythingButStyles = manyTill anyChar(styleWithQuotes< |> styleWithoutQuotes)< |> many1 anyChar

我知道这部分是因为我在everythingButStyles中使用manyTill,这就是我主动删除所有样式的东西,但在 styleWithoutQuotes 我需要它匹配>作为结束,但不消耗它,在parsec中,我会刚刚完成 lookAhead>但我无法在attoparsec中做到这一点。

解决方案

同时, lookAhead combinator已添加到 attoparsec ,所以现在可以使用 lookAhead(char'>') lookAhead(string>)以实现目标。



以下是引入之前的解决方法。




你可以建立你的号码耗时的解析器,使用 peekWord8 ,它只是查看下一个字节(如果有的话)。由于 ByteString 有一个 Monoid 实例, Parser ByteString 是一个 MonadPlus ,您可以使用

  lookGreater = do 
mbw< - peekWord8
case mbw of
只需62 - >返回>
_ - > mzero

(62是'>')找到'>'而不消耗它或失败。


I wrote a quick attoparsec parser to walk an aspx file and drop all the style attributes, and it's working fine except for one piece of it where I can't figure out how to make it succeed on matching > without consuming it.

Here's what I have:

anyTill = manyTill anyChar
anyBetween start end = start *> anyTill end

styleWithQuotes = anyBetween (stringCI "style=\"") (stringCI "\"")
styleWithoutQuotes = anyBetween (stringCI "style=") (stringCI " " <|> ">")
everythingButStyles = manyTill anyChar (styleWithQuotes <|> styleWithoutQuotes) <|> many1 anyChar

I understand it's partially because of how I'm using manyTill in everythingButStyles, that's how I am actively dropping all the styles stuff on the ground, but in styleWithoutQuotes I need it to match ">" as an end, but not consume it, in parsec I would have just done lookAhead ">" but I can't do that in attoparsec.

解决方案

Meanwhile, the lookAhead combinator was added to attoparsec, so now one can just use lookAhead (char '>') or lookAhead (string ">") to achieve the goal.

Below is a workaround from the times before its introduction.


You can build your non-consuming parser using peekWord8, which just looks at the next byte (if any). Since ByteString has a Monoid instance, Parser ByteString is a MonadPlus, and you can use

lookGreater = do
    mbw <- peekWord8
    case mbw of
      Just 62 -> return ">"
      _ -> mzero

(62 is the code point of '>') to either find a '>' without consuming it or fail.

这篇关于如何让Attoparsec解析器在不消耗的情况下成功(如parsec lookAhead)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆