将 NumPy 数组映射到位 [英] Mapping a NumPy array in place

查看:38
本文介绍了将 NumPy 数组映射到位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将 NumPy 数组映射到位?如果是,如何?

给定 a_values - 二维数组 - 这是目前对我有用的代码:

 for row in range(len(a_values)):对于范围内的 col(len(a_values[0])):a_values[row][col] = dim(a_values[row][col])

但是它太丑了,我怀疑在 NumPy 中的某个地方一定有一个函数可以做同样的事情:

a_values.map_in_place(dim)

但是如果有类似上面的东西,我一直找不到.

解决方案

只有在空间有限的情况下,才值得尝试就地执行此操作.如果是这种情况,可以通过迭代数组的扁平视图来稍微加快代码速度.由于 reshape 在可能的情况下返回一个新视图 ,数据本身不会被复制(除非原始数据具有异常结构).

我不知道有什么更好的方法可以真正实现任意 Python 函数的就地应用.

<预><代码>>>>def flat_for(a, f):... a = a.reshape(-1)...对于 i, v 在 enumerate(a):... a[i] = f(v)...>>>a = numpy.arange(25).reshape(5, 5)>>>flat_for(a, lambda x: x + 5)>>>一种数组([[ 5, 6, 7, 8, 9],[10, 11, 12, 13, 14],[15, 16, 17, 18, 19],[20, 21, 22, 23, 24],[25, 26, 27, 28, 29]])

一些时间:

<预><代码>>>>a = numpy.arange(2500).reshape(50, 50)>>>f = λ x: x + 5>>>%timeit flat_for(a, f)1000 个循环,最好的 3 个:每个循环 1.86 毫秒

它大约是嵌套循环版本的两倍:

<预><代码>>>>a = numpy.arange(2500).reshape(50, 50)>>>defnested_for(a, f):...对于我在范围内(len(a)):...对于范围内的 j(len(a[0])):... a[i][j] = f(a[i][j])...>>>%timeitnested_for(a, f)100 个循环,最好的 3 个:每个循环 3.79 毫秒

当然矢量化仍然更快,所以如果你可以复制,使用它:

<预><代码>>>>a = numpy.arange(2500).reshape(50, 50)>>>g = numpy.vectorize(lambda x: x + 5)>>>%timeit g(a)1000 个循环,最好的 3 个:每个循环 584 us

如果您可以使用内置 ufunc 重写 dim,那么请不要vectorize:

<预><代码>>>>a = numpy.arange(2500).reshape(50, 50)>>>%timeit a + 5100000 个循环,最好的 3 个:每个循环 4.66 us

numpy 就地执行诸如 += 之类的操作,正如您所期望的那样——因此您可以免费获得具有就地应用程序的 ufunc 的速度.有时甚至更快!有关示例,请参见此处.

<小时>

顺便说一下,我对这个问题的原始答案(可以在其编辑历史记录中查看)很荒谬,并且涉及将索引向量化为 a.它不仅需要做一些时髦的事情来绕过 vectorize类型检测机制,结果证明它和嵌套循环版本一样慢.太聪明了!

Is it possible to map a NumPy array in place? If yes, how?

Given a_values - 2D array - this is the bit of code that does the trick for me at the moment:

for row in range(len(a_values)):
    for col in range(len(a_values[0])):
        a_values[row][col] = dim(a_values[row][col])

But it's so ugly that I suspect that somewhere within NumPy there must be a function that does the same with something looking like:

a_values.map_in_place(dim)

but if something like the above exists, I've been unable to find it.

解决方案

It's only worth trying to do this in-place if you are under significant space constraints. If that's the case, it is possible to speed up your code a little bit by iterating over a flattened view of the array. Since reshape returns a new view when possible, the data itself isn't copied (unless the original has unusual structure).

I don't know of a better way to achieve bona fide in-place application of an arbitrary Python function.

>>> def flat_for(a, f):
...     a = a.reshape(-1)
...     for i, v in enumerate(a):
...         a[i] = f(v)
... 
>>> a = numpy.arange(25).reshape(5, 5)
>>> flat_for(a, lambda x: x + 5)
>>> a

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

Some timings:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> f = lambda x: x + 5
>>> %timeit flat_for(a, f)
1000 loops, best of 3: 1.86 ms per loop

It's about twice as fast as the nested loop version:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> def nested_for(a, f):
...     for i in range(len(a)):
...         for j in range(len(a[0])):
...             a[i][j] = f(a[i][j])
... 
>>> %timeit nested_for(a, f)
100 loops, best of 3: 3.79 ms per loop

Of course vectorize is still faster, so if you can make a copy, use that:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> g = numpy.vectorize(lambda x: x + 5)
>>> %timeit g(a)
1000 loops, best of 3: 584 us per loop

And if you can rewrite dim using built-in ufuncs, then please, please, don't vectorize:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> %timeit a + 5
100000 loops, best of 3: 4.66 us per loop

numpy does operations like += in place, just as you might expect -- so you can get the speed of a ufunc with in-place application at no cost. Sometimes it's even faster! See here for an example.


By the way, my original answer to this question, which can be viewed in its edit history, is ridiculous, and involved vectorizing over indices into a. Not only did it have to do some funky stuff to bypass vectorize's type-detection mechanism, it turned out to be just as slow as the nested loop version. So much for cleverness!

这篇关于将 NumPy 数组映射到位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆