使用foldLine解析许多块 [英] Parsing many blocks with foldLine
问题描述
对于这个简化的问题,我试图解析一个看起来像
For this simplified problem, I am trying to parse an input that looks like
foo bar
baz quux
woo
hoo xyzzy
glulx
转换为
[["foo", "bar", "baz", "quux", "woo"], ["hoo", "xyzzy", "glulx"]]
我试过的代码如下:
import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative
space :: Parser ()
space = L.space (void spaceChar) empty empty
item :: Parser () -> Parser String
item sp = L.lexeme sp $ some letterChar
items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' -> some (item sp')
items_ :: Parser [String]
items_ = items space
这适用于一块项目
:
λ» parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
但是,只要我试图解析多个项目
,它就会失败在第一个没有缩进的行上:
But as soon as I try to parse many items
, it fails on the first unindented line:
λ» parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n glulx"
4:1:
incorrect indentation (got 1, should be greater than 1)
或者更简单的输入:
or, with an even simpler input:
λ» parseTest (many items_) "a\nb"
2:1:
incorrect indentation (got 1, should be greater than 1)
推荐答案
Megaparsec的作者在这里:-)当你使用
Megaparsec时需要记住的一件事是它是词法分析器模块实际上是低级的。它
不会做任何你无法建立的东西,也不会将你锁定在任何
特定的框架中。所以基本上你的空间消费者
sp'
为你提供,但你应该谨慎使用它,因为它会确保
在你有缩进级别小于或等于
的缩进级别整个折叠的开始,这就是折叠如何结束的方式。
Megaparsec's author is here :-) One thing to remember when you work with
Megaparsec is that it's lexer module is really "low-level" on purpose. It
does not do anything you cannot build yourself, it doesn't lock you into any
particular "framework". So basicly in your case you have space consumer
sp'
provided for you, but you should use it carefully because it will sure
fail when you have indentation level less or equal to indentation level of
start of the whole fold, that's how your fold ends, by the way.
要引用文档:
创建一个支持换行的解析器。第一个参数用于
消耗行倍数的组件之间的空白,因此它必须消耗
换行以正常工作。第二个参数是一个回调,
接收自定义的耗费空间的解析器作为参数。这个解析器应该是
,它可以放在不同的
行上,并且可以放在不同的行折叠组件之后。
Create a parser that supports line-folding. The first argument is used to consume white space between components of line fold, thus it must consume newlines in order to work properly. The second argument is a callback that receives custom space-consuming parser as argument. This parser should be used after separate components of line fold that can be put on different lines.
sc = L.space (void spaceChar) empty empty
myFold = L.lineFold sc $ \sc' -> do
L.symbol sc' "foo"
L.symbol sc' "bar"
L.symbol sc "baz" -- for the last symbol we use normal space consumer
行折不能无限期地运行,因此您应该预期它会失败,并显示错误消息
类似于您拥有的消息马上。要成功,您应该认为
是一种完成的方式。这通常是通过在行结束时使用普通
空间消费者完成的:
Line fold cannot run indefinitely so you should expect it to fail with error message similar to what you have right now. To succeed, you should think about a way for it to finish. This is usually done via using of "normal" space consumer at the end of line fold:
space :: Parser ()
space = L.space (void spaceChar) empty empty
item :: Parser String
item = some letterChar
items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' ->
item `sepBy1` try sp' <* sp
items_ :: Parser [String]
items_ = items space
项目`sepBy1`尝试sp'
直到它失败,然后 sp
抓住其余的部分,因此可以解析
next fold。
item `sepBy1` try sp'
runs till it fails and then sp
grabs the rest, so
next fold can be parsed.
λ> parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n glulx"
[["foo","bar","baz","quux","woo"],["hoo","xyzzy","glulx"]]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo\nxyzzy\n glulx"
[["foo","bar","baz","quux","woo"],["hoo"],["xyzzy","glulx"]]
这篇关于使用foldLine解析许多块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!