如何应用将向量返回到每个 numpy 数组元素的函数(并获得更高维度的数组) [英] How to apply function which returns vector to each numpy array element (and get array with higher dimension)

查看:37
本文介绍了如何应用将向量返回到每个 numpy 数组元素的函数(并获得更高维度的数组)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

直接用代码写吧

注意:我将映射器(原始示例使用 x -> (x, 2 * x, 3 * x) 仅作为示例)编辑为通用黑盒函数,这会导致麻烦.

Note: I edited mapper (original example use x -> (x, 2 * x, 3 * x) just for example), to generic blackbox function, which cause the troubles.

import numpy as np

def blackbox_fn(x): #I can't be changed!
    assert np.array(x).shape == (), "I'm a fussy little function!"
    return np.array([x, 2*x, 3*x])

# let's have 2d array
arr2d = np.array(list(range(4)), dtype=np.uint8).reshape(2, 2)

# each element should be mapped to vector
def mapper(x, blackbox_fn):
    # there is some 3rdparty non-trivial function, returning np.array
    # in examples returns np.array((x, 2 * x, 3 * x))
    # but still this 3rdparty function operates only on scalar values
    return vectorized_blackbox_fn(x) 

所以对于输入二维数组

array([[0, 1],
       [2, 3]], dtype=uint8)

我想得到 3d 数组

array([[[0, 0, 0],
        [1, 2, 3]],

       [[2, 4, 6],
        [3, 6, 9]]], dtype=uint8)

我可以使用 for 循环编写朴素算法

I can write naive algorithm using for loop

# result should be 3d array, last dimension is same as mapper result size
arr3d = np.empty(arr2d.shape + (3,), dtype=np.uint8)
for y in range(arr2d.shape[1]):
    for x in xrange(arr2d.shape[0]):
        arr3d[x, y] = mapper(arr2d[x, y])

但是对于大型数组来说似乎很慢.我知道有 np.vectorize,但使用

But is seems quite slow for large arrays. I know there is np.vectorize, but using

np.vectorize(mapper)(arr2d)

不工作,因为

ValueError: setting an array element with a sequence.

(似乎 vectorize 不能改变维度)有没有更好的(numpy 惯用的和更快的)解决方案?

(seems that vectorize can't change dimension) Is there some better (numpy idiomatic and faster) solution?

推荐答案

np.vectorize 带有新的签名选项可以处理这个问题.它没有提高速度,但使维度簿记更容易.

np.vectorize with the new signature option can handle this. It doesn't improve the speed, but makes the dimensional bookkeeping easier.

In [159]: def blackbox_fn(x): #I can't be changed!
     ...:     assert np.array(x).shape == (), "I'm a fussy little function!"
     ...:     return np.array([x, 2*x, 3*x])
     ...: 

signature 的文档有点神秘.我以前用过它,所以先做了一个很好的猜测:

The documentation for signature is a bit cryptic. I've worked with it before, so made a good first guess:

In [161]: f = np.vectorize(blackbox_fn, signature='()->(n)')
In [162]: f(np.ones((2,2)))
Out[162]: 
array([[[ 1.,  2.,  3.],
        [ 1.,  2.,  3.]],

       [[ 1.,  2.,  3.],
        [ 1.,  2.,  3.]]])

使用您的阵列:

In [163]: arr2d = np.array(list(range(4)), dtype=np.uint8).reshape(2, 2)
In [164]: f(arr2d)
Out[164]: 
array([[[0, 0, 0],
        [1, 2, 3]],

       [[2, 4, 6],
        [3, 6, 9]]])
In [165]: _.dtype
Out[165]: dtype('int32')

dtype 没有保留,因为您的 blackbox_fn 没有保留它.默认情况下,vectorize 使用第一个元素进行测试计算,并使用其 dtype 来确定结果的 dtype.可以使用 otypes 参数指定返回数据类型.

The dtype is not preserved, because your blackbox_fn doesn't preserve it. As a default vectorize makes a test calculation with the first element, and uses its dtype to determine the result's dtype. It is possible to specify return dtype with the otypes parameter.

它可以处理 2d 以外的数组:

It can handle arrays other than 2d:

In [166]: f(np.arange(3))
Out[166]: 
array([[0, 0, 0],
       [1, 2, 3],
       [2, 4, 6]])
In [167]: f(3)
Out[167]: array([3, 6, 9])

With a signature vectorize 使用 Python 级迭代.在没有签名的情况下,它使用 np.frompyfunc,具有更好的性能.但是只要输入的元素必须调用blackbox_fn,我们就不能提高太多的速度(最多2倍).

With a signature vectorize is using a Python level iteration. Without a signature it uses np.frompyfunc, with a bit better performance. But as long as blackbox_fn has to be called for element of the input, we can't improve the speed by much (at most 2x).

np.frompyfunc 返回一个对象数据类型数组:

np.frompyfunc returns a object dtype array:

In [168]: fpy = np.frompyfunc(blackbox_fn, 1,1)
In [169]: fpy(1)
Out[169]: array([1, 2, 3])
In [170]: fpy(np.arange(3))
Out[170]: array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6])], dtype=object)
In [171]: np.stack(_)
Out[171]: 
array([[0, 0, 0],
       [1, 2, 3],
       [2, 4, 6]])
In [172]: fpy(arr2d)
Out[172]: 
array([[array([0, 0, 0]), array([1, 2, 3])],
       [array([2, 4, 6]), array([3, 6, 9])]], dtype=object)

stack 在这种二维情况下无法删除数组嵌套:

stack can't remove the array nesting in this 2d case:

In [173]: np.stack(_)
Out[173]: 
array([[array([0, 0, 0]), array([1, 2, 3])],
       [array([2, 4, 6]), array([3, 6, 9])]], dtype=object)

但我们可以解开它,然后堆叠.它需要一个reshape:

but we can ravel it, and stack. It needs a reshape:

In [174]: np.stack(__.ravel())
Out[174]: 
array([[0, 0, 0],
       [1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

<小时>

速度测试:


Speed tests:

In [175]: timeit f(np.arange(1000))
14.7 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [176]: timeit fpy(np.arange(1000))
4.57 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [177]: timeit np.stack(fpy(np.arange(1000).ravel()))
6.71 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [178]: timeit np.array([blackbox_fn(i) for i in np.arange(1000)])
6.44 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

<小时>

让你的函数返回一个列表而不是任何数组可能会使重组结果更容易,甚至可能更快


Having your function return a list instead of any array might make reassembling the result easier, and maybe even faster

def foo(x):
    return [x, 2*x, 3*x]

或者玩弄frompyfunc参数;

def foo(x):
    return x, 2*x, 3*x   # return a tuple
In [204]: np.stack(np.frompyfunc(foo, 1,3)(arr2d),2)
Out[204]: 
array([[[0, 0, 0],
        [1, 2, 3]],

       [[2, 4, 6],
        [3, 6, 9]]], dtype=object)

10 倍加速 - 我很惊讶:

10x speed up - I'm surprised:

In [212]: foo1 = np.frompyfunc(foo, 1,3)
In [213]: timeit np.stack(foo1(np.arange(1000)),1)
428 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

这篇关于如何应用将向量返回到每个 numpy 数组元素的函数(并获得更高维度的数组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆