将生成器拆分为多个块,而无需预先遍历 [英] Split a generator into chunks without pre-walking it

查看:78
本文介绍了将生成器拆分为多个块,而无需预先遍历的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(此问题与这一个

(This question is related to this one and this one, but those are pre-walking the generator, which is exactly what I want to avoid)

我想将生成器拆分为多个块.要求是:

I would like to split a generator in chunks. The requirements are:

  • 请勿填充数据块:如果剩余元素的数量小于数据块大小,则最后一个数据块必须较小.
  • 不要事先走过生成器:计算元素是昂贵的,并且必须仅由使用函数完成,而不是由分块器完成
  • 这当然意味着:不要在内存中累积(无列表)

我尝试了以下代码:

def head(iterable, max=10):
    for cnt, el in enumerate(iterable):
        yield el
        if cnt >= max:
            break

def chunks(iterable, size=10):
    i = iter(iterable)
    while True:
        yield head(i, size)

# Sample generator: the real data is much more complex, and expensive to compute
els = xrange(7)

for n, chunk in enumerate(chunks(els, 3)):
    for el in chunk:
        print 'Chunk %3d, value %d' % (n, el)

这以某种方式起作用:

Chunk   0, value 0
Chunk   0, value 1
Chunk   0, value 2
Chunk   1, value 3
Chunk   1, value 4
Chunk   1, value 5
Chunk   2, value 6
^CTraceback (most recent call last):
  File "xxxx.py", line 15, in <module>
    for el in chunk:
  File "xxxx.py", line 2, in head
    for cnt, el in enumerate(iterable):
KeyboardInterrupt

Buuuut ...由于while True,它永远不会停止(我必须按^C).每当生成器被耗尽时,我都想停止该循环,但是我不知道如何检测到这种情况.我尝试引发异常:

Buuuut ... it never stops (I have to press ^C) because of the while True. I would like to stop that loop whenever the generator has been consumed, but I do not know how to detect that situation. I have tried raising an Exception:

class NoMoreData(Exception):
    pass

def head(iterable, max=10):
    for cnt, el in enumerate(iterable):
        yield el
        if cnt >= max:
            break
    if cnt == 0 : raise NoMoreData()

def chunks(iterable, size=10):
    i = iter(iterable)
    while True:
        try:
            yield head(i, size)
        except NoMoreData:
            break

# Sample generator: the real data is much more complex, and expensive to compute    
els = xrange(7)

for n, chunk in enumerate(chunks(els, 2)):
    for el in chunk:
        print 'Chunk %3d, value %d' % (n, el)

但是然后仅在使用者的上下文中引发异常,这不是我想要的(我想保持使用者代码的清洁)

But then the exception is only raised in the context of the consumer, which is not what I want (I want to keep the consumer code clean)

Chunk   0, value 0
Chunk   0, value 1
Chunk   0, value 2
Chunk   1, value 3
Chunk   1, value 4
Chunk   1, value 5
Chunk   2, value 6
Traceback (most recent call last):
  File "xxxx.py", line 22, in <module>
    for el in chunk:
  File "xxxx.py", line 9, in head
    if cnt == 0 : raise NoMoreData
__main__.NoMoreData()

如何在chunks功能中检测到发电机已耗尽,而没有行走?

How can I detect that the generator is exhausted in the chunks function, without walking it?

推荐答案

一种方法是窥视第一个元素(如果有),然后创建并返回实际的生成器.

One way would be to peek at the first element, if any, and then create and return the actual generator.

def head(iterable, max=10):
    first = next(iterable)      # raise exception when depleted
    def head_inner():
        yield first             # yield the extracted first element
        for cnt, el in enumerate(iterable):
            yield el
            if cnt + 1 >= max:  # cnt + 1 to include first
                break
    return head_inner()

只需在您的chunk生成器中使用它,并像对待自定义异常一样捕获StopIteration异常.

Just use this in your chunk generator and catch the StopIteration exception like you did with your custom exception.

更新:这是另一个版本,使用 itertools.islice 替换大部分head函数和一个for循环.实际上,这个简单的for循环与原始代码中笨拙的while-try-next-except-break构造完全相同 ,因此结果更容易理解.

Update: Here's another version, using itertools.islice to replace most of the head function, and a for loop. This simple for loop in fact does exactly the same thing as that unwieldy while-try-next-except-break construct in the original code, so the result is much more readable.

def chunks(iterable, size=10):
    iterator = iter(iterable)
    for first in iterator:    # stops when iterator is depleted
        def chunk():          # construct generator for next chunk
            yield first       # yield element from for loop
            for more in islice(iterator, size - 1):
                yield more    # yield more elements from the iterator
        yield chunk()         # in outer generator, yield next chunk

使用 itertools.chain 替换,我们可以得到的甚至更短.内部生成器:

And we can get even shorter than that, using itertools.chain to replace the inner generator:

def chunks(iterable, size=10):
    iterator = iter(iterable)
    for first in iterator:
        yield chain([first], islice(iterator, size - 1))

这篇关于将生成器拆分为多个块,而无需预先遍历的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆