如何使用来自对函数的单独调用的值快速填充numpy数组 [英] How to quickly fill a numpy array with values from separate calls to a function

查看:142
本文介绍了如何使用来自对函数的单独调用的值快速填充numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用生成的值填充一个numpy数组.这些值由生成器函数生成.数组长度不是太长,通常小于100,但是此数组会生成很多次,所以我想知道是否可以通过使用一些numpy来优化它.

I want to fill a numpy array with generated values. These values are generated by a generator function. The array length is not too long, <100 usually, but this array is generated many times, so I wanted to know if it can be optimized with some fancy usage of numpy.

到目前为止,我已经可以使用香草python了:

So far I can already do it with vanilla python:

def generate():
   return generated_data

array = np.asarray([generate() for _ in range(array_length)])

我也尝试使用np.full(shape, fill_value):

np.full((array_length, generated_data_size), generate())

但这仅调用一次generate()函数,而不是对数组中的每个索引调用一次.

But this is calls the generate() function only once, not once for every index in the array.

我也尝试过np.vectorize(),但是我无法使其生成适当形状的数组.

I've also tried np.vectorize(), but I couldn't make it generate a appropriately shaped array.

推荐答案

NumPy无法做任何事情来加速重复调用并非旨在与NumPy进行交互的函数的过程.

There is nothing NumPy can do to accelerate the process of repeatedly calling a function not designed to interact with NumPy.

花哨的numpy用法"优化此方法的方法是手动重写generate函数以使用NumPy操作生成输出的整个数组,而不是仅支持单个值.那就是NumPy的工作方式,以及NumPy 必须的工作方式.任何涉及为每个数组单元一遍又一遍地调用Python函数的解决方案都将受到Python开销的限制. NumPy只能加速NumPy中实际发生的工作.

The "fancy usage of numpy" way to optimize this is to manually rewrite your generate function to use NumPy operations to generate entire arrays of output instead of only supporting single values. That's how NumPy works, and how NumPy has to work; any solution that involves calling a Python function over and over again for every array cell is going to be limited by Python overhead. NumPy can only accelerate work that actually happens in NumPy.

如果NumPy提供的操作过于局限而无法重写generate,则有一些选项,例如用Cython重写generate或在其上使用@numba.jit.这些主要用于涉及从一个循环迭代到下一个循环的复杂依赖关系的计算.它们对您无法重写的外部依赖项没有帮助.

If NumPy's provided operations are too limited to rewrite generate in terms of them, there are options like rewriting generate with Cython, or using @numba.jit on it. These mostly help with computations that involve complex dependencies from one loop iteration to the next; they don't help with external dependencies you can't rewrite.

如果您不能重写generate,您所能做的就是尝试优化将返回值放入数组的过程.根据数组大小,您可以通过重用单个数组对象来节省一些时间:

If you cannot rewrite generate, all you can do is try to optimize the process of getting the return values into your array. Depending on array size, you may be able to save some time by reusing a single array object:

In [32]: %timeit x = numpy.array([random.random() for _ in range(10)])
The slowest run took 5.13 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 5.44 µs per loop
In [33]: %%timeit x = numpy.empty(10)
   ....: for i in range(10):
   ....:     x[i] = random.random()
   ....: 
The slowest run took 4.26 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 5: 2.88 µs per loop

但是对于更大的数组,好处消失了:

but the benefit vanishes for larger arrays:

In [34]: %timeit x = numpy.array([random.random() for _ in range(100)])
10000 loops, best of 5: 21.9 µs per loop
In [35]: %%timeit x = numpy.empty(100)
   ....: for i in range(100):
   ....:     x[i] = random.random()
   ....: 
10000 loops, best of 5: 22.8 µs per loop

这篇关于如何使用来自对函数的单独调用的值快速填充numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆