用FParsec进行分块解析 [英] Chunked Parsing with FParsec
问题描述
是否有可能像从套接字一样以块的形式将输入提交给FParsec解析器?如果不是,是否有可能检索当前结果和输入流的未解析部分,以便我可以完成此操作?我正在尝试运行来自SocketAsyncEventArgs
的输入块,而不缓冲整个消息.
更新
注意到使用SocketAsyncEventArgs
的原因是为了表示将数据发送到CharStream
可能会导致对基础Stream
的异步访问.具体来说,我正在研究使用循环缓冲区来推送来自套接字的数据.我记得FParsec文档,其中指出不应异步访问基础Stream
,因此我计划手动控制分块分析.
最终问题:
- 我可以在传递给
CharStream
的Stream
下使用循环缓冲区吗? - 在这种情况下,我不需要担心手动控制分块吗?
FParsec的常规版本(尽管不是 CharStream
文档.因此,如果从System.IO.Stream
构造CharStream
并且内容足够大以跨越多个CharStream
块,则可以在完全检索输入之前开始解析.
但是请注意,CharStream
将以固定(但可配置)大小的块消耗输入流,即,它将按需要使用System.IO.Stream
的Read
方法来填充完整的堵塞.因此,如果您解析输入的速度快于检索新输入的速度,即使已经有一些未解析的输入,CharStream
也会阻塞,因为没有足够的输入来填充完整的块.
更新
您的最终问题的答案: 42.
-
如何实现构建
CharStream
的Stream
完全取决于您.您要记住的排除并行访问的限制仅适用于CharStream
类,它不是线程安全的. -
将
Stream
用作循环缓冲区很可能限制可以回溯的最大距离. -
CharStream
的块大小会影响Stream
不支持搜寻时可以回溯的距离. -
异步解析输入的最简单方法是在异步任务中(即在后台线程上)进行解析.在该任务中,您可以简单地同步读取套接字,或者,如果您不信任操作系统的缓冲,则可以使用流类,如您在下面的第二条评论中链接的文章中所述的
BlockingStream
. p> -
如果可以轻松地将输入分成独立的块(例如,基于行的文本格式的行),则将其自己分块然后逐块分析输入块可能会更有效.
Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input stream so that I might accomplish this? I'm trying to run the chunks of input coming in from SocketAsyncEventArgs
without buffering entire messages.
Update
The reason for noting the use of SocketAsyncEventArgs
was to denote that sending data to a CharStream
might result in asynchronous access to the underlying Stream
. Specifically, I'm looking at using a circular buffer to push the data coming in from the socket. I remember the FParsec documentation noting that the underlying Stream
should not be accessed asynchronously, so I had planned on manually controlling the chunked parsing.
Ultimate questions:
- Can I use a circular buffer under my
Stream
passed to theCharStream
? - Do I not need to worry myself with manually controlling the chunking in this scenario?
The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream
documentation. Thus, if you construct a CharStream
from a System.IO.Stream
and the content is large enough to span multiple CharStream
blocks, you can start parsing before you've fully retrieved the input.
Note however, that the CharStream
will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read
method of the System.IO.Stream
as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream
may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.
Update
The answer(s) to your ultimate questions: 42.
How you implement the
Stream
from which you construct theCharStream
is entirely up to you. The restriction you're remembering that excludes parallel access only applies to theCharStream
class, which isn't thread safe.Implementing the
Stream
as a circular buffer will likely restrict the maximum distance over which you can backtrack.The block size of the
CharStream
influences how far you can backtrack when theStream
does not support seeking.The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the
BlockingStream
described in the article you linked in the second comment below.If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.
这篇关于用FParsec进行分块解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!