用FParsec进行分块解析 [英] Chunked Parsing with FParsec

查看:102
本文介绍了用FParsec进行分块解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能像从套接字一样以块的形式将输入提交给FParsec解析器?如果不是,是否有可能检索当前结果和输入流的未解析部分,以便我可以完成此操作?我正在尝试运行来自SocketAsyncEventArgs的输入块,而不缓冲整个消息.

更新

注意到使用SocketAsyncEventArgs的原因是为了表示将数据发送到CharStream可能会导致对基础Stream的异步访问.具体来说,我正在研究使用循环缓冲区来推送来自套接字的数据.我记得FParsec文档,其中指出不应异步访问基础Stream,因此我计划手动控制分块分析.

最终问题:

  1. 我可以在传递给CharStreamStream下使用循环缓冲区吗?
  2. 在这种情况下,我不需要担心手动控制分块吗?

解决方案

FParsec的常规版本(尽管不是限制可以回溯的最大距离.

  • CharStream的块大小会影响Stream不支持搜寻时可以回溯的距离.

  • 异步解析输入的最简单方法是在异步任务中(即在后台线程上)进行解析.在该任务中,您可以简单地同步读取套接字,或者,如果您不信任操作系统的缓冲,则可以使用流类,如您在下面的第二条评论中链接的文章中所述的BlockingStream. p>

  • 如果可以轻松地将输入分成独立的块(例如,基于行的文本格式的行),则将其自己分块然后逐块分析输入块可能会更有效.

  • Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input stream so that I might accomplish this? I'm trying to run the chunks of input coming in from SocketAsyncEventArgs without buffering entire messages.

    Update

    The reason for noting the use of SocketAsyncEventArgs was to denote that sending data to a CharStream might result in asynchronous access to the underlying Stream. Specifically, I'm looking at using a circular buffer to push the data coming in from the socket. I remember the FParsec documentation noting that the underlying Stream should not be accessed asynchronously, so I had planned on manually controlling the chunked parsing.

    Ultimate questions:

    1. Can I use a circular buffer under my Stream passed to the CharStream?
    2. Do I not need to worry myself with manually controlling the chunking in this scenario?

    解决方案

    The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream documentation. Thus, if you construct a CharStream from a System.IO.Stream and the content is large enough to span multiple CharStream blocks, you can start parsing before you've fully retrieved the input.

    Note however, that the CharStream will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read method of the System.IO.Stream as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.

    Update

    The answer(s) to your ultimate questions: 42.

    • How you implement the Stream from which you construct the CharStream is entirely up to you. The restriction you're remembering that excludes parallel access only applies to the CharStream class, which isn't thread safe.

    • Implementing the Stream as a circular buffer will likely restrict the maximum distance over which you can backtrack.

    • The block size of the CharStream influences how far you can backtrack when the Stream does not support seeking.

    • The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the BlockingStream described in the article you linked in the second comment below.

    • If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.

    这篇关于用FParsec进行分块解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆