生成解析器,该解析器在另一个解析器的输出上运行接收的解析器,并单次连接结果 [英] Generate parser that runs a received parser on the output of another parser and monadically joins the results

查看:80
本文介绍了生成解析器,该解析器在另一个解析器的输出上运行接收的解析器,并单次连接结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定以下类型和函数,意味着将CSV字段的字段解析为字符串:

  type Parser resultType = ParsecT String()标识resultType 
cell :: Parser字符串

我实现了以下功能:

  customCell :: String  - >解析器资源 - >解析器res 
customCell typeName subparser =
cell
>> =或者(const $ unexpected typeName)
return。 parse(subparser< * eof)

虽然我不能停下来想我不使用Monad的概念尽可能多,并最终有一个更好的方法来合并内部结果和外部分析器,特别是关于它的失败。



有没有人知道我该怎么做,或者这个代码是做什么的?



PS - 我现在意识到我的类型简化可能不合适,可能是什么我想要的是用Monad替换底层的Identity Monad ....不幸的是,我还不太了解monad变形金刚。



PS2 - 到底是什么无论如何,底层monad是否有用?

解决方案

详细论述@Daniel Wagner的答案...解析器通常使用Parsec构建,您从低级解析器开始解析特定字符(例如,加号或数字),然后使用组合器(如 many1 > combinator)构建解析器,解析器会读取单个数字转换成一个读取一个或多个数字的解析器,或者解析器一个或多个数字,后跟一个加号后跟一个或多个数字的单子解析)。

但是,每个解析器,无论是低级数字解析器还是更高级别的添加表达式解析器,都旨在直接应用于相同的输入流。



通常 所做的是编写一个解析器,用于捕获输入流的一部分,以产生 String 和另一个分析器来解析 String (而不是原始输入流)并尝试c结合他们。这是一种垂直构图,并不直接得到Parsec的支持,看起来不自然,也不是一元。

正如评论中指出的那样,有些情况下垂直构图是最干净的整体方法(例如,当你在另一种语言的组件或表达式中嵌入一种语言时),但这不是Parsec分析器采用的常用方法。



应用程序的底线是一个单元格解析器,它只生成一个 String 太专业化以至于无法使用。一个更有用的用于CSV文件的Parsec框架应该是:

  import Text.Parsec 
import Text.Parsec.String

- | `csv cell`解析一个CSV文件,每个文件的元素由`cell`解析
csv :: Parser a - >解析器[[a]]
csv cell =许多(行单元格)

- | `row cell`解析以逗号分隔的新行终止行
- `cell`-expressions
row :: Parser a - >解析器[a]
row cell = sepBy cell(char',')< * char'\\\
'

现在,您可以编写一个自定义单元分析器来分析正整数:

  customCell :: Parser Int 
customCell =读取< $>许多1位数字

并解析CSV文件:

 >解析(csv customCell)1,2,3 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' > 

这里,不是使用单元格子分析器明确地将逗号分隔的单元格解析为要馈送到不同解析器的字符串,单元是隐式上下文,其中提供的单元解析器被调用以在适当的位置解析底层输入流,逗号分隔的单元格,位于输入流中间的一行中间。


given the following type and function, meant to parse a field of a CSV field into a string:

type Parser resultType = ParsecT String () Identity resultType
cell :: Parser String 

I have implemented the following function:

customCell :: String -> Parser res  -> Parser res
customCell typeName subparser = 
  cell
    >>= either (const $ unexpected typeName) 
               return . parse (subparser <* eof) ""

Though I cannot stop thinking that I am not using the Monad concept as much as desired and that eventually there is a better way to merge the result of the inner with the outer parser, specially on what regards its failure.

Does anybody know how could I do so, or is this code what is meant to be done?

PS - I now realised that my type simplification is probably not appropriate and that maybe what I want is to replace the underlying Identity Monad by the Either Monad.... Unfortunately, I do not feel enough acquainted with monad transformers yet.

PS2 - What the hell is the underlying monad good for anyway?

解决方案

Elaborating on @Daniel Wagner's answer... The way parsers are normally built with Parsec, you start with low-level parsers that parse specific characters (e.g., a plus sign or a digit), and you build parsers on top of them using combinators (like a many1 combinator that turns a parser that reads a single digit into a parser that reads one or more digits, or a monadic parse that parsers "one or more digits" followed by a "plus sign" followed by "one or more digits").

However, each parser, whether it's a low-level digit parser or a higher-level "addition expression" parser, is intended to be applied directly to the same input stream.

What you don't typically do is write a parser that gobbles a chunk of the input stream to produce, say, a String and another parser that parses that String (instead of the original input stream) and try to combine them. This is the kind of "vertical composition" that isn't directly supported by Parsec and looks unnatural and non-monadic.

As pointed out in the comments, there are some situations where vertical composition is the cleanest overall approach (like when you have one language embedded within the components or expressions of another language), but it's not the usual approach taken by a Parsec parser.

The bottom line in your application is that a cell parser that produces only a String is too specialized to be useful. A more useful Parsec framework for CSV files would be:

import Text.Parsec
import Text.Parsec.String

-- | `csv cell` parses a CSV file each of whose elements is parsed by `cell`
csv :: Parser a -> Parser [[a]]
csv cell = many (row cell)

-- | `row cell` parses a newline-terminated row of comma separated
--   `cell`-expressions
row :: Parser a -> Parser [a]
row cell = sepBy cell (char ',') <* char '\n'

Now, you can write a custom cell parser that parses positive integers:

customCell :: Parser Int
customCell = read <$> many1 digit

and parse CSV files:

> parse (csv customCell) "" "1,2,3\n4,5,6\n"
Right [[1,2,3],[4,5,6]]
>

Here, instead of having a cell subparser that explicitly parses a comma-delimited cell into a string to be fed to a different parser, the "cell" is an implicit context in which a supplied cell parser is called to parse the underlying input stream at the appropriate point where one would expect a comma-delimited cell in the middle of a row in the middle of the input stream.

这篇关于生成解析器,该解析器在另一个解析器的输出上运行接收的解析器,并单次连接结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆