python:有一个库函数用于分块输入流吗? [英] python: is there a library function for chunking an input stream?
问题描述
我想对输入流进行分块以进行批处理。给定输入列表或生成器,
I want to chunk an input stream for batch processing. Given an input list or generator,
x_in = [1, 2, 3, 4, 5, 6 ...]
我想要一个函数,该函数将返回该输入的大块。假设,如果 chunk_size = 4
,则
I want a function that will return chunks of that input. Say, if chunk_size=4
, then,
x_chunked = [[1, 2, 3, 4], [5, 6, ...], ...]
这是我一遍又一遍的事情,我想知道是否有比我自己编写的方法更标准的方法。我在 itertools
中缺少什么吗? (一个人可以用 enumerate
和 groupby
解决该问题,但这感觉很笨拙。)万一有人想看到一个
This is something I do over and over, and was wondering if there is a more standard way than writing it myself. Am I missing something in itertools
? (One could solve the problem with enumerate
and groupby
, but that feels clunky.) In case anyone wants to see an implementation, here it is,
def chunk_input_stream(input_stream, chunk_size):
"""partition a generator in a streaming fashion"""
assert chunk_size >= 1
accumulator = []
for x in input_stream:
accumulator.append(x)
if len(accumulator) == chunk_size:
yield accumulator
accumulator = []
if accumulator:
yield accumulator
编辑
受kreativitea的回答启发,这是使用 islice
的解决方案,这很简单&不需要后期过滤,
Edit
Inspired by kreativitea's answer, here's a solution with islice
, which is straightforward & doesn't require post-filtering,
from itertools import islice
def chunk_input_stream(input_stream, chunk_size):
while True:
chunk = list(islice(input_stream, chunk_size))
if chunk:
yield chunk
else:
return
# test it with list(chunk_input_stream(iter([1, 2, 3, 4]), 3))
推荐答案
[感谢OP的更新版本:我一直在将 yield from
扔给自从我升级之后,它的视线就消失了,我什至没有想到我在这里不需要它。]
[Updated version thanks to the OP: I've been throwing yield from
at everything in sight since I upgraded and it didn't even occur to me that I didn't need it here.]
哦,这到底是什么:
from itertools import takewhile, islice, count
def chunk(stream, size):
return takewhile(bool, (list(islice(stream, size)) for _ in count()))
:
>>> list(chunk((i for i in range(3)), 3))
[[0, 1, 2]]
>>> list(chunk((i for i in range(6)), 3))
[[0, 1, 2], [3, 4, 5]]
>>> list(chunk((i for i in range(8)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7]]
警告:以上内容与OP的 chunk_input_stream
存在相同的问题输入是一个列表。您可以通过额外的 iter()
包装来解决此问题,但这并不那么漂亮。从概念上讲,使用 repeat
或 cycle
可能比 count()$ c更有意义$ c>,但由于某种原因我正在计算字符数。 :^)
Warning: the above suffers the same problem as the OP's chunk_input_stream
if the input is a list. You could get around this with an extra iter()
wrap but that's less pretty. Conceptually, using repeat
or cycle
might make more sense than count()
but I was character-counting for some reason. :^)
[FTR:不,我仍然对此并不完全认真,但是,嘿,这是星期一。]
[FTR: no, I'm still not entirely serious about this, but hey-- it's a Monday.]
这篇关于python:有一个库函数用于分块输入流吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!