numpy中的功能分配 [英] Functional assignment in numpy
问题描述
假设我有两个数组
A = [ 6, 4, 5, 7, 9 ]
ind = [ 0, 0, 2, 1, 2 ]
和函数f.
我想构建一个新数组B,其大小为ind中与B [i]相同的in的不同元素的数量,f的结果为参数i由i索引的A子数组.
I want to build a new array B of size the number of distinct elements in ind with B[i] the result of f with parameter the subarray of A indexed by i.
在此示例中,如果我取f =和,则
For this example, if I take f = sum, then
B = [10, 7, 14]
或f = max
B = [6, 7, 9]
是否有比numpy中的for循环更有效的方法?
Is there a more efficient way than a for loop in numpy ?
谢谢
推荐答案
对于f = sum
的特殊情况:
In [32]: np.bincount(ind,A)
Out[32]: array([ 10., 7., 14.])
假设:
Assuming:
-
f
是ufunc - 您有足够的内存来制作2D
形状为
len(A) x len(A)
的数组
f
is a ufunc- You have enough memory to make a 2D
array of shape
len(A) x len(A)
您可以制作2D数组B
:
B=np.zeros((len(A),max(ind)+1))
并用A
中的值填充B
中的各个位置,以便B
的第一列仅在ind == 0
时获得A
的值,而B
的第二列仅获得A
时来自A
的值,等等:
and fill in various locations in B
with values from A
, such that the first column of B
only gets values from A
when ind == 0
, and the second column of B
only gets values from A
when ind == 1
, etc:
B[zip(*enumerate(ind))]=A
您最终会得到一个类似的数组
you'd end up with an array like
[[ 6. 0. 0.]
[ 4. 0. 0.]
[ 0. 0. 5.]
[ 0. 7. 0.]
[ 0. 0. 9.]]
然后您可以沿轴= 0施加f
以获得所需的结果.
这里有第三个假设:
You could then apply f
along axis=0 to obtain your desired result.
There is a third assumption used here:
-
B
中的多余零不影响 预期的结果.
- The extra zeros in
B
do not affect the desired result.
如果您可以忍受这些假设,那么:
If you can stomach these assumptions then:
import numpy as np
A = np.array([ 6, 4, 5, 7, 9 ])
ind = np.array([ 0, 0, 2, 1, 2 ])
N=100
M=10
A2 = np.array([np.random.randint(M) for i in range(N)])
ind2 = np.array([np.random.randint(M) for i in range(N)])
def use_extra_axis(A,ind,f):
B=np.zeros((len(A),max(ind)+1))
B[zip(*enumerate(ind))]=A
return f(B)
def use_loop(A,ind,f):
n=max(ind)+1
B=np.empty(n)
for i in range(n):
B[i]=f(A[ind==i])
return B
def fmax(arr):
return np.max(arr,axis=0)
if __name__=='__main__':
print(use_extra_axis(A,ind,fmax))
print(use_loop(A,ind,fmax))
对于M
和N
的某些值(例如M = 10,N = 100),使用额外的轴可能比使用循环更快:
For certain values of M
and N
(e.g. M=10, N=100), using an extra axis may be faster than using a loop:
% python -mtimeit -s'import test,numpy' 'test.use_extra_axis(test.A2,test.ind2,test.fmax)'
10000 loops, best of 3: 162 usec per loop
% python -mtimeit -s'import test,numpy' 'test.use_loop(test.A2,test.ind2,test.fmax)'
1000 loops, best of 3: 222 usec per loop
但是,随着N变大(例如M = 10,N = 10000),使用循环可能会更快:
However, as N grows larger (say M=10, N=10000), using a loop may be faster:
% python -mtimeit -s'import test,numpy' 'test.use_extra_axis(test.A2,test.ind2,test.fmax)'
100 loops, best of 3: 13.9 msec per loop
% python -mtimeit -s'import test,numpy' 'test.use_loop(test.A2,test.ind2,test.fmax)'
100 loops, best of 3: 4.4 msec per loop
结合使用稀疏矩阵的 Thuis的绝妙想法:
def use_sparse_extra_axis(A,ind,f):
B=scipy.sparse.coo_matrix((A, (range(len(A)), ind))).toarray()
return f(B)
def use_sparse(A,ind,f):
return [f(v) for v in scipy.sparse.coo_matrix((A, (ind, range(len(A))))).tolil().data]
哪种实现最好取决于参数N
和M
:
Which implementation is best depends on the parameters N
and M
:
N=1000, M=100
·───────────────────────·────────────────────·
│ use_sparse_extra_axis │ 1.15 msec per loop │
│ use_extra_axis │ 2.79 msec per loop │
│ use_loop │ 3.47 msec per loop │
│ use_sparse │ 5.25 msec per loop │
·───────────────────────·────────────────────·
N=100000, M=10
·───────────────────────·────────────────────·
│ use_sparse_extra_axis │ 35.6 msec per loop │
│ use_loop │ 43.3 msec per loop │
│ use_sparse │ 91.5 msec per loop │
│ use_extra_axis │ 150 msec per loop │
·───────────────────────·────────────────────·
N=100000, M=50
·───────────────────────·────────────────────·
│ use_sparse │ 94.1 msec per loop │
│ use_loop │ 107 msec per loop │
│ use_sparse_extra_axis │ 170 msec per loop │
│ use_extra_axis │ 272 msec per loop │
·───────────────────────·────────────────────·
N=10000, M=50
·───────────────────────·────────────────────·
│ use_loop │ 10.9 msec per loop │
│ use_sparse │ 11.7 msec per loop │
│ use_sparse_extra_axis │ 15.1 msec per loop │
│ use_extra_axis │ 25.4 msec per loop │
·───────────────────────·────────────────────·
这篇关于numpy中的功能分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!