生成器的python生成器? [英] python generator of generators?

查看:128
本文介绍了生成器的python生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个读取txt文件的类.该文件由非空行的块组成(我们称它们为节"),由空行分隔:

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:

line1.1
line1.2
line1.3

line2.1
line2.2

我的第一个实现是读取整个文件并返回列表列表,即部分列表,其中每个部分都是行列表. 这显然在记忆方面很糟糕.

My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines. This was obviously terrible memory-wise.

因此,我将其重新实现为列表的生成器,也就是说,在每个周期中,我的班级都会以列表的形式读取内存中的整个部分并产生它.

So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.

这是更好的方法,但是对于较大的部分仍然存在问题.所以我想知道是否可以将其重新实现为生成器?问题在于此类非常通用,并且应该能够满足以下两个用例:

This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:

  1. 读取一个很大的文件,其中包含很大的部分,并且仅循环浏览一次.发电机是完美的选择.
  2. 将一个较小的文件读入内存以循环多次.列表生成器工作正常,因为用户可以调用

  1. read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
  2. read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke

list(MyClass(file_handle))

list(MyClass(file_handle))

但是,生成器的生成器在情况2中不起作用,因为内部对象不会转换为列表.

However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.

除了实现显式的to_list()方法之外,还有什么比将生成器的生成器转换为列表列表更优雅的方法了吗?

Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?

推荐答案

Python 2:

map(list, generator_of_generators)

Python 3:

list(map(list, generator_of_generators))

或两者皆是:

[list(gen) for gen in generator_of_generators]


由于生成的对象是generator functions,而不仅仅是生成器,所以您想这样做


Since the generated objects are generator functions, not mere generators, you'd want to do

[list(gen()) for gen in generator_of_generator_functions]

如果那不起作用,我不知道你在问什么.另外,为什么它会返回生成器函数而不是生成器本身?

If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?

由于您在评论中说过,您想避免list(generator_of_generator_functions)神秘崩溃,所以这取决于您真正想要的是什么.

Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.

  • 不可能以这种方式覆盖list的行为:是否存储子生成器元素

  • It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not

如果确实发生了崩溃,我建议每次主发电机迭代时都要用主发电机循环耗尽子发电机.这是标准做法,也是itertools.groupby的作用,即stdlib-generator-of-generators.

If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.

例如

def metagen():
    def innergen():
        yield 1
        yield 2
        yield 3

    for i in range(3):
        r = innergen()
        yield r

        for _ in r: pass

  • 或者使用我将在Mo'中显示的黑暗,秘密的黑客方法(我需要编写),但不要这样做!
  • 如所承诺的那样,黑客(对于Python 3,这次是回合"):

    As promised, the hack (for Python 3, this time 'round):

    from collections import UserList
    from functools import partial
    
    
    def objectitemcaller(key):
        def inner(*args, **kwargs):
            try:
                return getattr(object, key)(*args, **kwargs)
            except AttributeError:
                return NotImplemented
        return inner
    
    
    class Listable(UserList):
        def __init__(self, iterator):
            self.iterator = iterator
            self.iterated = False
    
        def __iter__(self):
            return self
    
        def __next__(self):
            self.iterated = True
            return next(self.iterator)
    
        def _to_list_hack(self):
            self.data = list(self)
            del self.iterated
            del self.iterator
            self.__class__ = UserList
    
    for key in UserList.__dict__.keys() - Listable.__dict__.keys():
        if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
            setattr(Listable, key, objectitemcaller(key))
    
    
    def metagen():
        def innergen():
            yield 1
            yield 2
            yield 3
    
        for i in range(3):
            r = Listable(innergen())
            yield r
    
            if not r.iterated:
                r._to_list_hack()
    
            else:
                for item in r: pass
    
    for item in metagen():
        print(item)
        print(list(item))
    #>>> <Listable object at 0x7f46e4a4b850>
    #>>> [1, 2, 3]
    #>>> <Listable object at 0x7f46e4a4b950>
    #>>> [1, 2, 3]
    #>>> <Listable object at 0x7f46e4a4b990>
    #>>> [1, 2, 3]
    
    list(metagen())
    #>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
    

    这太糟糕了,我什至不想解释.

    It's so bad I don't want to even explain it.

    关键是您有一个包装器,可以检测它是否已被迭代,如果没有运行,请运行一个_to_list_hack,我不告诉您,该包装会更改__class__属性.

    The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.

    由于布局冲突,我们必须使用UserList类并对其所有方法进行阴影处理,这只是另一部分内容.

    Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.

    基本上,请不要使用此hack.不过,您可以像幽默一样享受它.

    Basically, please don't use this hack. You can enjoy it as humour, though.

    这篇关于生成器的python生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆