如何从生成器构建 numpy 数组? [英] How do I build a numpy array from a generator?

查看:21
本文介绍了如何从生成器构建 numpy 数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从生成器对象中构建一个 numpy 数组?

让我来说明问题:

<预><代码>>>>导入 numpy>>>定义给我():... 对于 x 范围内的 x(10):... 产量 x...>>>给我()<0x28a1758处的生成器对象>>>>列表(给我())[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>>>numpy.array(xrange(10))数组([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>>numpy.array(给我())数组(<0x28a1758处的生成器对象>,dtype=object)>>>numpy.array(列表(给我()))数组([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

在本例中,gimme() 是我想将其输出转换为数组的生成器.但是,数组构造函数不会迭代生成器,它只是存储生成器本身.我想要的行为来自 numpy.array(list(gimme())),但我不想支付将中间列表和最终数组放在内存中的内存开销同时.有没有更节省空间的方法?

解决方案

Numpy 数组需要在创建时明确设置其长度,这与 Python 列表不同.这是必要的,以便可以在内存中连续分配每个项目的空间.连续分配是 numpy 数组的关键特性:这与本机代码实现相结合,让对它们的操作比常规列表执行得更快.

记住这一点,技术上不可能将生成器对象转换为数组,除非您这样做:

  1. 可以预测运行时会产生多少元素:

    my_array = numpy.empty(predict_length())对于 i, el in enumerate(gimme()): my_array[i] = el

  2. 愿意将其元素存储在中间列表中:

    my_array = numpy.array(list(gimme()))

  3. 可以制作两个完全相同的生成器,遍历第一个求总长,初始化数组,然后再次遍历生成器找到每个元素:

    length = sum(1 for el in gimme())my_array = numpy.empty(长度)对于 i, el in enumerate(gimme()): my_array[i] = el

1 可能正是您要找的.2 是空间效率低下的,而 3 时间效率低下的(你必须经过两次生成器).

How can I build a numpy array out of a generator object?

Let me illustrate the problem:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this instance, gimme() is the generator whose output I'd like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don't want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?

解决方案

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

  1. can predict how many elements it will yield when run:

    my_array = numpy.empty(predict_length())
    for i, el in enumerate(gimme()): my_array[i] = el
    

  2. are willing to store its elements in an intermediate list :

    my_array = numpy.array(list(gimme()))
    

  3. can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:

    length = sum(1 for el in gimme())
    my_array = numpy.empty(length)
    for i, el in enumerate(gimme()): my_array[i] = el
    

1 is probably what you're looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).

这篇关于如何从生成器构建 numpy 数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆