使用foldLine解析许多块 [英] Parsing many blocks with foldLine

查看:197
本文介绍了使用foldLine解析许多块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于这个简化的问题,我试图解析一个看起来像

For this simplified problem, I am trying to parse an input that looks like

foo bar
 baz quux 
 woo
hoo xyzzy 
  glulx

转换为

[["foo", "bar", "baz", "quux", "woo"], ["hoo", "xyzzy", "glulx"]]

我试过的代码如下:

import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative

space :: Parser ()
space = L.space (void spaceChar) empty empty

item :: Parser () -> Parser String
item sp = L.lexeme sp $ some letterChar

items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' -> some (item sp')

items_ :: Parser [String]
items_ = items space

这适用于一块项目

λ» parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]

但是,只要我试图解析多个项目,它就会失败在第一个没有缩进的行上:

But as soon as I try to parse many items, it fails on the first unindented line:

λ» parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n  glulx"
4:1:
incorrect indentation (got 1, should be greater than 1)

或者更简单的输入:

or, with an even simpler input:

λ» parseTest (many items_) "a\nb"
2:1:
incorrect indentation (got 1, should be greater than 1)


推荐答案

Megaparsec的作者在这里:-)当你使用
Megaparsec时需要记住的一件事是它是词法分析器模块实际上是低级的。它
不会做任何你无法建立的东西,也不会将你锁定在任何
特定的框架中。所以基本上你的空间消费者
sp'为你提供,但你应该谨慎使用它,因为它会确保
在你有缩进级别小于或等于
的缩进级别整个折叠的开始,这就是折叠如何结束的方式。

Megaparsec's author is here :-) One thing to remember when you work with Megaparsec is that it's lexer module is really "low-level" on purpose. It does not do anything you cannot build yourself, it doesn't lock you into any particular "framework". So basicly in your case you have space consumer sp' provided for you, but you should use it carefully because it will sure fail when you have indentation level less or equal to indentation level of start of the whole fold, that's how your fold ends, by the way.

要引用文档


创建一个支持换行的解析器。第一个参数用于
消耗行倍数的组件之间的空白,因此它必须消耗
换行以正常工作。第二个参数是一个回调,
接收自定义的耗费空间的解析器作为参数。这个解析器应该是
,它可以放在不同的
行上,并且可以放在不同的行折叠组件之后。

Create a parser that supports line-folding. The first argument is used to consume white space between components of line fold, thus it must consume newlines in order to work properly. The second argument is a callback that receives custom space-consuming parser as argument. This parser should be used after separate components of line fold that can be put on different lines.





sc = L.space (void spaceChar) empty empty

myFold = L.lineFold sc $ \sc' -> do
  L.symbol sc' "foo"
  L.symbol sc' "bar"
  L.symbol sc  "baz" -- for the last symbol we use normal space consumer

行折不能无限期地运行,因此您应该预期它会失败,并显示错误消息
类似于您拥有的消息马上。要成功,您应该认为
是一种完成的方式。这通常是通过在行结束时使用普通
空间消费者完成的:


Line fold cannot run indefinitely so you should expect it to fail with error message similar to what you have right now. To succeed, you should think about a way for it to finish. This is usually done via using of "normal" space consumer at the end of line fold:

space :: Parser ()
space = L.space (void spaceChar) empty empty

item :: Parser String
item = some letterChar

items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' ->
  item `sepBy1` try sp' <* sp

items_ :: Parser [String]
items_ = items space

项目`sepBy1`尝试sp'直到它失败,然后 sp 抓住其余的部分,因此可以解析
next fold。

item `sepBy1` try sp' runs till it fails and then sp grabs the rest, so next fold can be parsed.

λ> parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n  glulx"
[["foo","bar","baz","quux","woo"],["hoo","xyzzy","glulx"]]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo\nxyzzy\n  glulx"
[["foo","bar","baz","quux","woo"],["hoo"],["xyzzy","glulx"]]

这篇关于使用foldLine解析许多块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆