将生成器拆分为多个块,而无需预先遍历 [英] Split a generator into chunks without pre-walking it
问题描述
(此问题与这一个和
(This question is related to this one and this one, but those are pre-walking the generator, which is exactly what I want to avoid)
我想将生成器拆分为多个块.要求是:
I would like to split a generator in chunks. The requirements are:
- 请勿填充数据块:如果剩余元素的数量小于数据块大小,则最后一个数据块必须较小.
- 不要事先走过生成器:计算元素是昂贵的,并且必须仅由使用函数完成,而不是由分块器完成
- 这当然意味着:不要在内存中累积(无列表)
我尝试了以下代码:
def head(iterable, max=10):
for cnt, el in enumerate(iterable):
yield el
if cnt >= max:
break
def chunks(iterable, size=10):
i = iter(iterable)
while True:
yield head(i, size)
# Sample generator: the real data is much more complex, and expensive to compute
els = xrange(7)
for n, chunk in enumerate(chunks(els, 3)):
for el in chunk:
print 'Chunk %3d, value %d' % (n, el)
这以某种方式起作用:
Chunk 0, value 0
Chunk 0, value 1
Chunk 0, value 2
Chunk 1, value 3
Chunk 1, value 4
Chunk 1, value 5
Chunk 2, value 6
^CTraceback (most recent call last):
File "xxxx.py", line 15, in <module>
for el in chunk:
File "xxxx.py", line 2, in head
for cnt, el in enumerate(iterable):
KeyboardInterrupt
Buuuut ...由于while True
,它永远不会停止(我必须按^C
).每当生成器被耗尽时,我都想停止该循环,但是我不知道如何检测到这种情况.我尝试引发异常:
Buuuut ... it never stops (I have to press ^C
) because of the while True
. I would like to stop that loop whenever the generator has been consumed, but I do not know how to detect that situation. I have tried raising an Exception:
class NoMoreData(Exception):
pass
def head(iterable, max=10):
for cnt, el in enumerate(iterable):
yield el
if cnt >= max:
break
if cnt == 0 : raise NoMoreData()
def chunks(iterable, size=10):
i = iter(iterable)
while True:
try:
yield head(i, size)
except NoMoreData:
break
# Sample generator: the real data is much more complex, and expensive to compute
els = xrange(7)
for n, chunk in enumerate(chunks(els, 2)):
for el in chunk:
print 'Chunk %3d, value %d' % (n, el)
但是然后仅在使用者的上下文中引发异常,这不是我想要的(我想保持使用者代码的清洁)
But then the exception is only raised in the context of the consumer, which is not what I want (I want to keep the consumer code clean)
Chunk 0, value 0
Chunk 0, value 1
Chunk 0, value 2
Chunk 1, value 3
Chunk 1, value 4
Chunk 1, value 5
Chunk 2, value 6
Traceback (most recent call last):
File "xxxx.py", line 22, in <module>
for el in chunk:
File "xxxx.py", line 9, in head
if cnt == 0 : raise NoMoreData
__main__.NoMoreData()
如何在chunks
功能中检测到发电机已耗尽,而没有行走?
How can I detect that the generator is exhausted in the chunks
function, without walking it?
推荐答案
一种方法是窥视第一个元素(如果有),然后创建并返回实际的生成器.
One way would be to peek at the first element, if any, and then create and return the actual generator.
def head(iterable, max=10):
first = next(iterable) # raise exception when depleted
def head_inner():
yield first # yield the extracted first element
for cnt, el in enumerate(iterable):
yield el
if cnt + 1 >= max: # cnt + 1 to include first
break
return head_inner()
只需在您的chunk
生成器中使用它,并像对待自定义异常一样捕获StopIteration
异常.
Just use this in your chunk
generator and catch the StopIteration
exception like you did with your custom exception.
更新:这是另一个版本,使用 itertools.islice
替换大部分head
函数和一个for
循环.实际上,这个简单的for
循环与原始代码中笨拙的while-try-next-except-break
构造完全相同 ,因此结果更容易理解.
Update: Here's another version, using itertools.islice
to replace most of the head
function, and a for
loop. This simple for
loop in fact does exactly the same thing as that unwieldy while-try-next-except-break
construct in the original code, so the result is much more readable.
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator: # stops when iterator is depleted
def chunk(): # construct generator for next chunk
yield first # yield element from for loop
for more in islice(iterator, size - 1):
yield more # yield more elements from the iterator
yield chunk() # in outer generator, yield next chunk
使用 itertools.chain
替换,我们可以得到的甚至更短.内部生成器:
And we can get even shorter than that, using itertools.chain
to replace the inner generator:
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
这篇关于将生成器拆分为多个块,而无需预先遍历的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!