我如何将`1`和`2`分成几组? [英] How do I separate a stream of `1`s and `2`s into groups?
问题描述
1
1
1
1
2
2
2
2
1
1
1
1
2
2
1
1
2
2
...
我有兴趣对连续的 1
s的每一组进行分离和执行计算, code> 2 s。
然而,问题稍微复杂一些事实上。首先,这些记录不是简单的 1
s和 2
s,而是更大的,比如说单独的记录
s,并且它需要一些计算来确定 Record
是否 1
或 2
;第二,数据源不是作为文件公开的,而是只通过检索函数自我公开,比如 get1 :: IO(也许记录)
。该函数返回一个记录在每次调用时只记录
,并在数据耗尽时返回 Nothing
;最后,这个文件很大(实际上是无限的),我必须以不断的内存使用方式以流的方式来完成它。
这是一个假设的记录,我希望它的行为:( 1
和 2
s是 Record 从
get1
调用中检索到的code> s,并且(,)
s表示每次检测后立即发生的计算)
1
以简化的方式收集输入行通常涉及像管道或管道这样的迭代风格库。>
1
1
2
2
1
([1,1,1],[2,2])
2
1
([1 ],[2])
1
1
1
2
2
2
2
1
([1,1,1,1,[2,2,2,2])
我会用熟悉的管道,但类似的东西可能可以通过管道来完成。
首先,有分组的问题。 Pipes使用称为pipes-group
的低级库来处理这个问题,该库管理分组到子流中,而不会将元素收集到内存中。 (它是pipes-bytestring
和pipes-text
库中类似功能的基础。)
pipes-group通过将输入分割为多个使用
FreeT
分隔的生产者来完成此操作。 FreeT本质上允许构建生产者的链接列表。import Control.Lens
导入管道
导入Pipes.Group
导入限定Pipes.Prelude为P
main = runEffect $(concats。view groups)P.stdinLn> - > P.stdoutLn
这会将输入行组合(由
(==)
),但立即将它们连接在一起,这不是非常有用。为了证明分组真的发生了,我们可以使用intercalates
:
import Control.Lens
导入管道
导入Pipes.Group
导入合格Pipes.Prelude为P
main = runEffect $
(intercalates(yield !)。view groups)P.stdinLn> - > P.stdoutLn
这会输出一个!在每组之间,这至少表明分组正在正常工作。为了收集小组的元素,我们使用内置的支持:
foldl
库的流式折叠:import Control.Lens
导入管道
导入Pipes.Group
导入限定的Pipes.Prelude为P
main = runEffect $
(fold(++)[] id。view groups)P.stdinLn> - > P.stdoutLn
请注意,虽然标准输入将以恒定空间进行流式处理,但这会将整个组收集到内存中,但当然没有办法避免这种情况。
有关更多信息,请参阅 pipes-group教程和 foldl包裹。
Imagine I have a really long file like this:
1 1 1 1 2 2 2 2 1 1 1 1 2 2 1 1 2 2 ...
I'm interested in separating and performing computation on each group of consecutive
1
s along with the consecutive2
s that come immediately after it.However, the problem is somewhat more complicated in reality. First, the records are not simply
1
s and2
s but something larger, let's say individualRecord
s, and it requires some computation to determine whether aRecord
is1
or2
; second, instead of exposed as a file, the data source only exposes it self with a retrieving function, let's sayget1 :: IO (Maybe Record)
. This function returns one recordJust record
with each call, and returnsNothing
when the data is depleted; and finally, the file is large (effectively infinite), I have to do it in a stream-lined way with constant memory usage.Here is a supposed transcript demonstrating how I want it to behave: (the
1
s and2
s are theRecord
s retrieved fromget1
calls, and the(,)
s represent computations happened immediately after each detection of group)1 1 1 2 2 1 (["1","1","1"],["2","2"]) 2 1 (["1"],["2"]) 1 1 1 2 2 2 2 1 (["1","1","1","1"],["2","2","2","2"])
解决方案Collecting lines of input in a streamlined way usually involves an iteratee-style library like pipes or conduit. I'll use pipes out of familiarity but something similar could probably be accomplished with conduit.
First, there's the matter of grouping. Pipes handles this with a low-level library called
pipes-group
that manages grouping into substreams without collecting elements into memory. (It is the basis of similar functionality in thepipes-bytestring
andpipes-text
libraries.)pipes-group accomplishes this by splitting the input into multiple Producers delimited using
FreeT
. FreeT essentially allows for the construction of a "linked list" of producers.import Control.Lens import Pipes import Pipes.Group import qualified Pipes.Prelude as P main = runEffect $ (concats . view groups) P.stdinLn >-> P.stdoutLn
This will group input lines (by
(==)
) but then immediately concatenate them back together, which is not very useful. To demonstrate that the grouping is really happening, we can useintercalates
:import Control.Lens import Pipes import Pipes.Group import qualified Pipes.Prelude as P main = runEffect $ (intercalates (yield "!") . view groups) P.stdinLn >-> P.stdoutLn
This will output a "!" in between each group, which at least shows that the grouping is working properly. To collect the elements of the group together, we use the built-in support for the
foldl
library's streaming folds:import Control.Lens import Pipes import Pipes.Group import qualified Pipes.Prelude as P main = runEffect $ (folds (++) [] id . view groups) P.stdinLn >-> P.stdoutLn
Note that while stdin will be streamed in constant space, this will collect the entire groups into memory in the resulting lists, but of course there's no way to avoid that.
For more info, see the pipes-group tutorial and the foldl package.
这篇关于我如何将`1`和`2`分成几组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!