numpy np.apply_along_axis 函数加速? [英] numpy np.apply_along_axis function speed up?

查看:28
本文介绍了numpy np.apply_along_axis 函数加速?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

np.apply_along_axis() 函数似乎非常很慢(15 分钟后没有输出).有没有一种快速的方法可以在长数组上执行这个函数而不必并行化操作?我特指具有数百万个元素的数组.

这是我正在尝试做的一个例子.请忽略 my_func 的简单定义,目标不是将数组乘以 55(当然无论如何都可以就地完成),而是一个说明.在实践中, my_func 稍微复杂一点,需要额外的参数,因此 a 的每个元素都进行了不同的修改,即不仅仅是乘以 55.

<预><代码>>>>def my_func(a):...返回 a[0]*55>>>a = np.ones((200000000,1))>>>np.apply_along_axis(my_func, 1, a)

a = np.ones((20,1))def my_func(a, i,j):... b = np.zeros((2,2))... b[0,0] = a[i]... b[1,0] = a[i]... b[0,1] = a[i]... b[1,1] = a[j]...返回 linalg.eigh(b)>>>my_func(a,1,1)(数组([ 0., 2.]), 数组([[-0.70710678, 0.70710678],[ 0.70710678, 0.70710678]]))

解决方案

np.apply_along_axis 不是为了速度.

没有办法将纯 Python 函数应用于 Numpy 数组的每个元素,而不需要多次调用它,除非 AST 重写...

幸运的是,有解决方案:

  • 矢量化

    虽然这通常很难,但通常是简单的解决方案.找到某种方法以概括元素的方式表达您的计算,以便您可以立即处理整个矩阵.这将导致循环从 Python 中提升到高度优化的 C 和 Fortran 例程中.

  • JITing:NumbaParakeet,以及在较小程度上PyPyNumPyPy

    Numba 和 Parakeet 都处理 Numpy 数据结构上的 JITing 循环,因此如果您将循环内联到一个函数(这可以是一个包装函数),您可以获得几乎-自由.不过,这取决于所使用的数据结构.

  • 符号求值器,例如 Theanonumexpr

    这些允许您使用嵌入式语言来表达计算,最终比矢量化版本要快得多.

  • CythonC 扩展

    如果一切都没有了,你总是可以手动挖掘到 C.Cython 隐藏了很多复杂性,也有很多可爱的魔法,所以它并不总是那么糟糕(尽管它有助于了解你是什么正在做).

<小时>

给你.

这是我的测试环境"(你真的应该提供这个:P):

import itertools导入 numpya = numpy.arange(200).reshape((200,1)) ** 2def my_func(a, i,j):b = numpy.zeros((2,2))b[0,0] = a[i]b[1,0] = a[i]b[0,1] = a[i]b[1,1] = a[j]返回 numpy.linalg.eigh(b)eigvals = {}eigvecs = {}对于 i, j 在 itertools.combinations(range(a.size), 2):eigvals[i, j], eigvecs[i, j] = my_func(a,i,j)

现在,获得所有排列而不是组合要容易得多,因为您可以这样做:

# 所有*排列*,不是组合索引 = numpy.mgrid[:a.size, :a.size]

这可能看起来很浪费,但只有两倍的排列,所以没什么大不了的.

所以我们要使用这些索引来获取相关元素:

#去除多余的维度;这里不想要!subs = a[:,0][索引]

然后我们可以制作我们的矩阵:

target = numpy.array([[子[0],子[0]],[子[0],子[1]]])

我们需要矩阵位于最后两个维度:

target.shape#>>>(2, 2, 200, 200)目标 = numpy.swapaxes(目标, 0, 2)目标 = numpy.swapaxes(目标, 1, 3)目标形状#>>>(200, 200, 2, 2)

我们可以检查它是否有效:

目标[10, 20]#>>>数组([[100, 100],#>>>[100, 400]])

耶!

那么我们只需运行 numpy.linalg.eigh:

值,向量 = numpy.linalg.eigh(target)

看,它有效!

values[10, 20]#>>>数组([69.72243623,430.27756377])eigvals[10, 20]#>>>数组([69.72243623,430.27756377])

那么我想你可能想要连接这些:

numpy.concatenate([values[row, row+1:] for row in range(len(values))])#>>>数组([[ 0.00000000e+00, 1.00000000e+00],#>>>[ 0.00000000e+00, 4.00000000e+00],#>>>[ 0.00000000e+00, 9.00000000e+00],#>>>...,#>>>[ 1.96997462e+02, 7.78160025e+04],#>>>[ 3.93979696e+02, 7.80160203e+04],#>>>[ 1.97997475e+02, 7.86070025e+04]])numpy.concatenate([vectors[row, row+1:] for row in range(len(vectors))])#>>>数组([[[ 1. , 0. ],#>>>[ 0. , 1. ]],#>>>#>>>[[ 1. , 0. ],#>>>[ 0. , 1. ]],#>>>#>>>[[ 1. , 0. ],#>>>[ 0. , 1. ]],#>>>#>>>...,#>>>[[-0.70890372, 0.70530527],#>>>[0.70530527, 0.70890372]],#>>>#>>>[[-0.71070503, 0.70349013],#>>>[0.70349013, 0.71070503]],#>>>#>>>[[-0.70889463, 0.7053144],#>>>[ 0.7053144 , 0.70889463]]])

也可以在 numpy.mgrid 之后执行此连接循环以将工作量减半:

# 所有*排列*,不是组合索引 = numpy.mgrid[:a.size, :a.size]# 转换为所有*组合*并降低维度index = numpy.concatenate([indexes[:, row, row+1:] for row in range(indexes.shape[1])],axis=1)# 去除多余的维度;这里不想要!subs = a[:,0][索引]目标 = numpy.array([[子[0],子[0]],[子[0],子[1]]])目标 = numpy.rollaxis(目标,2)值,向量 = numpy.linalg.eigh(target)

是的,您只需要最后一个样品即可.

The np.apply_along_axis() function seems to be very slow (no output after 15 mins). Is there a fast way to perform this function on a long array without having to parallelize the operation? I am specifically talking about arrays with millions of elements.

Here is an example of what I am trying to do. Please ignore the simplistic definition of my_func, the goal is not to multiply the array by 55 (which of course can be done in place anyway) but an illustration. In practice, my_func is a little more complicated, takes extra arguments and as a result each element of a is modified differently, i.e. not just multiplied by 55.

>>> def my_func(a):
...     return a[0]*55
>>> a = np.ones((200000000,1))
>>> np.apply_along_axis(my_func, 1, a)

Edit:

a = np.ones((20,1))

def my_func(a, i,j):
...     b = np.zeros((2,2))
...     b[0,0] = a[i]
...     b[1,0] = a[i]
...     b[0,1] = a[i]
...     b[1,1] = a[j]
...     return  linalg.eigh(b)


>>> my_func(a,1,1)
(array([ 0.,  2.]), array([[-0.70710678,  0.70710678],
   [ 0.70710678,  0.70710678]]))

解决方案

np.apply_along_axis is not for speed.

There is no way to apply a pure Python function to every element of a Numpy array without calling it that many times, short of AST rewriting...

Fortunately, there are solutions:

  • Vectorizing

    Although this is often hard, it's normally the easy solution. Find some way to express your calculation in a way that generalizes over the elements, so you can work on the whole matrix at once. This will result in the loops being hoisted out of Python and in to heavily optimised C and Fortran routines.

  • JITing: Numba and Parakeet, and to a lesser extent PyPy with NumPyPy

    Numba and Parakeet both deal with JITing loops over Numpy data structures, so if you inline the looping into a function (this can be a wrapper function), you can get massive speed boosts for almost-free. This depends on the data structures used, though.

  • Symbolic evaluators like Theano and numexpr

    These allow you to use embedded languages to express calculations, which can end up much faster than even the vectorized versions.

  • Cython and C extensions

    If all else is lost, you can always dig down manually to C. Cython hides a lot of the complexity and has a lot of lovely magic too, so it's not always that bad (although it helps to know what you're doing).


Here you go.

This is my testing "environment" (you should really have provided this :P):

import itertools
import numpy

a = numpy.arange(200).reshape((200,1)) ** 2

def my_func(a, i,j):
    b = numpy.zeros((2,2))
    b[0,0] = a[i]
    b[1,0] = a[i]
    b[0,1] = a[i]
    b[1,1] = a[j]
    return  numpy.linalg.eigh(b)

eigvals = {}
eigvecs = {}

for i, j in itertools.combinations(range(a.size), 2):
    eigvals[i, j], eigvecs[i, j] = my_func(a,i,j)

Now, it's far easier to get all the permutations instead of the combinations, because you can just do this:

# All *permutations*, not combinations
indexes = numpy.mgrid[:a.size, :a.size]

This might seem wasteful, but there are only twice as many permutations so it's not a big deal.

So we want to use these indexes to get the relevant elements:

# Remove the extra dimension; it's not wanted here!
subs = a[:,0][indexes]

and then we can make our matrices:

target = numpy.array([
    [subs[0], subs[0]],
    [subs[0], subs[1]]
])

We need the matrices to be in the last two dimensions:

target.shape
#>>> (2, 2, 200, 200)

target = numpy.swapaxes(target, 0, 2)
target = numpy.swapaxes(target, 1, 3)

target.shape
#>>> (200, 200, 2, 2)

And we can check that it works:

target[10, 20]
#>>> array([[100, 100],
#>>>        [100, 400]])

Yay!

So then we just run numpy.linalg.eigh:

values, vectors = numpy.linalg.eigh(target)

And look, it works!

values[10, 20]
#>>> array([  69.72243623,  430.27756377])

eigvals[10, 20]
#>>> array([  69.72243623,  430.27756377])

So then I'd imagine you might want to concatenate these:

numpy.concatenate([values[row, row+1:] for row in range(len(values))])
#>>> array([[  0.00000000e+00,   1.00000000e+00],
#>>>        [  0.00000000e+00,   4.00000000e+00],
#>>>        [  0.00000000e+00,   9.00000000e+00],
#>>>        ..., 
#>>>        [  1.96997462e+02,   7.78160025e+04],
#>>>        [  3.93979696e+02,   7.80160203e+04],
#>>>        [  1.97997475e+02,   7.86070025e+04]])

numpy.concatenate([vectors[row, row+1:] for row in range(len(vectors))])
#>>> array([[[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        [[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        [[ 1.        ,  0.        ],
#>>>         [ 0.        ,  1.        ]],
#>>> 
#>>>        ..., 
#>>>        [[-0.70890372,  0.70530527],
#>>>         [ 0.70530527,  0.70890372]],
#>>> 
#>>>        [[-0.71070503,  0.70349013],
#>>>         [ 0.70349013,  0.71070503]],
#>>> 
#>>>        [[-0.70889463,  0.7053144 ],
#>>>         [ 0.7053144 ,  0.70889463]]])

It's also possible to do this concatenate loop just after numpy.mgrid to halve the amount of work:

# All *permutations*, not combinations
indexes = numpy.mgrid[:a.size, :a.size]

# Convert to all *combinations* and reduce the dimensionality
indexes = numpy.concatenate([indexes[:, row, row+1:] for row in range(indexes.shape[1])], axis=1)

# Remove the extra dimension; it's not wanted here!
subs = a[:,0][indexes]

target = numpy.array([
    [subs[0], subs[0]],
    [subs[0], subs[1]]
])

target = numpy.rollaxis(target, 2)

values, vectors = numpy.linalg.eigh(target)

Yeah, that last sample is all you need.

这篇关于numpy np.apply_along_axis 函数加速?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆