使用 FParsec 进行分块解析 [英] Chunked Parsing with FParsec

查看:16
本文介绍了使用 FParsec 进行分块解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以像从套接字一样将输入以块的形式提交给 FParsec 解析器?如果没有,是否可以检索输入流的当前结果和未解析部分,以便我可以完成此操作?我正在尝试运行来自 SocketAsyncEventArgs 的输入块,而不缓冲整个消息.

Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input stream so that I might accomplish this? I'm trying to run the chunks of input coming in from SocketAsyncEventArgs without buffering entire messages.

更新

注意到使用 SocketAsyncEventArgs 的原因是表示将数据发送到 CharStream 可能会导致对底层 Stream 的异步访问.具体来说,我正在考虑使用循环缓冲区来推送来自套接字的数据.我记得 FParsec 文档指出底层 Stream 不应该被异步访问,所以我计划手动控制分块解析.

The reason for noting the use of SocketAsyncEventArgs was to denote that sending data to a CharStream might result in asynchronous access to the underlying Stream. Specifically, I'm looking at using a circular buffer to push the data coming in from the socket. I remember the FParsec documentation noting that the underlying Stream should not be accessed asynchronously, so I had planned on manually controlling the chunked parsing.

终极问题:

  1. 我可以在传递给 CharStreamStream 下使用循环缓冲区吗?
  2. 在这种情况下,我不需要担心手动控制分块吗?
  1. Can I use a circular buffer under my Stream passed to the CharStream?
  2. Do I not need to worry myself with manually controlling the chunking in this scenario?

推荐答案

FParsec 的普通版本(虽然不是 Low-Trust version) 以块方式或块方式"读取输入,正如我在 CharStream 文档.因此,如果您从 System.IO.Stream 构造一个 CharStream 并且内容足够大以跨越多个 CharStream 块,则可以开始在您完全检索输入之前进行解析.

The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream documentation. Thus, if you construct a CharStream from a System.IO.Stream and the content is large enough to span multiple CharStream blocks, you can start parsing before you've fully retrieved the input.

但是请注意,CharStream 将以固定(但可配置)大小的块消耗输入流,即它将调用 Read 方法>System.IO.Stream 尽可能频繁地填充一个完整的块.因此,如果您解析输入的速度比检索新输入的速度快,即使已经有一些未解析的输入,CharStream 也可能会阻塞,因为还没有足够的输入来填充一个完整的块.

Note however, that the CharStream will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read method of the System.IO.Stream as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.

更新

最终问题的答案:42.

  • 如何实现用于构建 CharStreamStream 完全取决于您.您记得的排除并行访问的限制仅适用于 CharStream 类,该类不是线程安全的.

  • How you implement the Stream from which you construct the CharStream is entirely up to you. The restriction you're remembering that excludes parallel access only applies to the CharStream class, which isn't thread safe.

Stream 实现为循环缓冲区可能会限制您可以回溯的最大距离.

Implementing the Stream as a circular buffer will likely restrict the maximum distance over which you can backtrack.

CharStream 的块大小会影响 Stream 不支持搜索时可以回溯的距离.

The block size of the CharStream influences how far you can backtrack when the Stream does not support seeking.

异步解析输入的最简单方法是在异步任务中(即在后台线程上)进行解析.在任务中,您可以简单地同步读取套接字,或者,如果您不信任操作系统的缓冲,则可以使用流类,例如您在第二篇链接的文章中描述的 BlockingStream在下方评论.

The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the BlockingStream described in the article you linked in the second comment below.

如果可以轻松地将输入分成独立的块(例如,用于基于行的文本格式的行),那么自己将其分块然后逐块解析输入块可能会更有效.

If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.

这篇关于使用 FParsec 进行分块解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆