如何从生成器构建numpy数组? [英] How do I build a numpy array from a generator?

查看:142
本文介绍了如何从生成器构建numpy数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从生成器对象中构建一个numpy数组?

How can I build a numpy array out of a generator object?

让我来说明问题:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

在这种情况下,gimme()是生成器,我想将其输出转换为数组.但是,数组构造函数不会在生成器上进行迭代,它只是存储生成器本身.我想要的行为是来自numpy.array(list(gimme()))的行为,但我不想支付将中间列表和最终数组同时存储在内存中的内存开销.有没有更节省空间的方法?

In this instance, gimme() is the generator whose output I'd like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don't want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?

推荐答案

与python列表不同,numpy数组要求在创建时明确设置其长度.这是必要的,以便可以在内存中连续分配每个项目的空间.连续分配是numpy数组的关键特征:此方法与本机代码实现相结合,使对它们的操作比常规列表执行得快得多.

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

请牢记这一点,从技术上讲,除非有以下两种情况,否则不可能将生成器对象转换为数组:

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

  1. 可以预测运行时将产生多少个元素:

  1. can predict how many elements it will yield when run:

my_array = numpy.empty(predict_length())
for i, el in enumerate(gimme()): my_array[i] = el

  • 愿意将其元素存储在中间列表中:

  • are willing to store its elements in an intermediate list :

    my_array = numpy.array(list(gimme()))
    

  • 可以创建两个相同的生成器,遍历第一个生成器以找到总长度,初始化数组,然后再次遍历生成器以查找每个元素:

  • can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:

    length = sum(1 for el in gimme())
    my_array = numpy.empty(length)
    for i, el in enumerate(gimme()): my_array[i] = el
    

  • 1 可能正是您想要的. 2 是空间效率低的,而 3 是时间效率低的(您必须经过发生器两次).

    1 is probably what you're looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).

    这篇关于如何从生成器构建numpy数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆