我如何将`1`和`2`分成几组? [英] How do I separate a stream of `1`s and `2`s into groups?

查看:110
本文介绍了我如何将`1`和`2`分成几组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个像这样的长文件:

  1 
1
1
1
2
2
2
2
1
1
1
1
2
2
1
1
2
2
...

我有兴趣对连续的 1 s的每一组进行分离和执行计算, code> 2 s。



然而,问题稍微复杂一些事实上。首先,这些记录不是简单的 1 s和 2 s,而是更大的,比如说单独的记录 s,并且它需要一些计算来确定 Record 是否 1 2 ;第二,数据源不是作为文件公开的,而是只通过检索函数自我公开,比如 get1 :: IO(也许记录)。该函数返回一个记录在每次调用时只记录,并在数据耗尽时返回 Nothing ;最后,这个文件很大(实际上是无限的),我必须以不断的内存使用方式以流的方式来完成它。



这是一个假设的记录,我希望它的行为:( 1 2 s是 Record 从 get1 调用中检索到的code> s,并且(,) s表示每次检测后立即发生的计算)

  1 
1
1
2
2
1
([1,1,1],[2,2])
2
1
([1 ],[2])
1
1
1
2
2
2
2
1
([1,1,1,1,[2,2,2,2])
以简化的方式收集输入行通常涉及像管道或管道这样的迭代风格库。>

我会用熟悉的管道,但类似的东西可能可以通过管道来完成。

首先,有分组的问题。 Pipes使用称为 pipes-group 的低级库来处理这个问题,该库管理分组到子流中,而不会将元素收集到内存中。 (它是 pipes-bytestring pipes-text 库中类似功能的基础。)



pipes-group通过将输入分割为多个使用 FreeT 分隔的生产者来完成此操作。 FreeT本质上允许构建生产者的链接列表。

  import Control.Lens 
导入管道
导入Pipes.Group
导入限定Pipes.Prelude为P

main = runEffect $(concats。view groups)P.stdinLn> - > P.stdoutLn

这会将输入行组合(由(==)),但立即将它们连接在一起,这不是非常有用。为了证明分组真的发生了,我们可以使用 intercalates

  import Control.Lens 
导入管道
导入Pipes.Group
导入合格Pipes.Prelude为P

main = runEffect $
(intercalates(yield !)。view groups)P.stdinLn> - > P.stdoutLn

这会输出一个!在每组之间,这至少表明分组正在正常工作。为了收集小组的元素,我们使用内置的支持: foldl 库的流式折叠:

  import Control.Lens 
导入管道
导入Pipes.Group
导入限定的Pipes.Prelude为P

main = runEffect $
(fold(++)[] id。view groups)P.stdinLn> - > P.stdoutLn

请注意,虽然标准输入将以恒定空间进行流式处理,但这会将整个组收集到内存中,但当然没有办法避免这种情况。



有关更多信息,请参阅 pipes-group教程 foldl包裹


Imagine I have a really long file like this:

1
1
1
1
2
2
2
2
1
1
1
1
2
2
1
1
2
2
...

I'm interested in separating and performing computation on each group of consecutive 1s along with the consecutive 2s that come immediately after it.

However, the problem is somewhat more complicated in reality. First, the records are not simply 1s and 2s but something larger, let's say individual Records, and it requires some computation to determine whether a Record is 1 or 2; second, instead of exposed as a file, the data source only exposes it self with a retrieving function, let's say get1 :: IO (Maybe Record). This function returns one record Just record with each call, and returns Nothing when the data is depleted; and finally, the file is large (effectively infinite), I have to do it in a stream-lined way with constant memory usage.

Here is a supposed transcript demonstrating how I want it to behave: (the 1s and 2s are the Records retrieved from get1 calls, and the (,)s represent computations happened immediately after each detection of group)

1
1
1
2
2
1
(["1","1","1"],["2","2"])
2
1
(["1"],["2"])
1
1
1
2
2
2
2
1
(["1","1","1","1"],["2","2","2","2"])

解决方案

Collecting lines of input in a streamlined way usually involves an iteratee-style library like pipes or conduit. I'll use pipes out of familiarity but something similar could probably be accomplished with conduit.

First, there's the matter of grouping. Pipes handles this with a low-level library called pipes-group that manages grouping into substreams without collecting elements into memory. (It is the basis of similar functionality in the pipes-bytestring and pipes-text libraries.)

pipes-group accomplishes this by splitting the input into multiple Producers delimited using FreeT. FreeT essentially allows for the construction of a "linked list" of producers.

import Control.Lens
import Pipes
import Pipes.Group
import qualified Pipes.Prelude as P

main = runEffect $ (concats . view groups) P.stdinLn >-> P.stdoutLn

This will group input lines (by (==)) but then immediately concatenate them back together, which is not very useful. To demonstrate that the grouping is really happening, we can use intercalates:

import Control.Lens
import Pipes
import Pipes.Group
import qualified Pipes.Prelude as P

main = runEffect $
         (intercalates (yield "!") . view groups) P.stdinLn >-> P.stdoutLn

This will output a "!" in between each group, which at least shows that the grouping is working properly. To collect the elements of the group together, we use the built-in support for the foldl library's streaming folds:

import Control.Lens
import Pipes
import Pipes.Group
import qualified Pipes.Prelude as P

main = runEffect $
         (folds (++) [] id . view groups) P.stdinLn >-> P.stdoutLn

Note that while stdin will be streamed in constant space, this will collect the entire groups into memory in the resulting lists, but of course there's no way to avoid that.

For more info, see the pipes-group tutorial and the foldl package.

这篇关于我如何将`1`和`2`分成几组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆