在两个numpy向量中的成对元素上用函数填充矩阵的最快方法? [英] Fastest way to populate a matrix with a function on pairs of elements in two numpy vectors?
问题描述
我有两个一维numpy向量va
和vb
,它们通过将所有对组合传递给一个函数来填充矩阵.
I have two 1 dimensional numpy vectors va
and vb
which are being used to populate a matrix by passing all pair combinations to a function.
na = len(va)
nb = len(vb)
D = np.zeros((na, nb))
for i in range(na):
for j in range(nb):
D[i, j] = foo(va[i], vb[j])
就目前而言,由于va和vb相对较大(4626和737),因此这段代码需要很长时间才能运行.但是,我希望可以通过使用scipy中的cdist
方法执行类似过程并具有非常好的性能的事实来改善这一点.
As it stands, this piece of code takes a very long time to run due to the fact that va and vb are relatively large (4626 and 737). However I am hoping this can be improved due to the fact that a similiar procedure is performed using the cdist
method from scipy with very good performance.
D = cdist(va, vb, metric)
我显然知道scipy具有在C中而不是在python中运行这段代码的好处-但我希望有一些不知道的numpy函数可以快速执行此操作.
I am obviously aware that scipy has the benefit of running this piece of code in C rather than in python - but I'm hoping there is some numpy function im unaware of that can execute this quickly.
推荐答案
最鲜为人知的numpy函数之一,用于文档调用 np.frompyfunc
.这将从Python函数创建一个numpy ufunc.不是某些其他对象可以模拟numpy的ufunc,而是具有所有特征的适当ufunc.尽管该行为在许多方面与np.vectorize
非常相似,但它具有一些明显的优点,希望以下代码应突出显示:
One of the least known numpy functions for what the docs call functional programming routines is np.frompyfunc
. This creates a numpy ufunc from a Python function. Not some other object that closely simulates a numpy ufunc, but a proper ufunc with all its bells and whistles. While the behavior is in many aspects very similar to np.vectorize
, it has some distinct advantages, that hopefully the following code should highlight:
In [2]: def f(a, b):
...: return a + b
...:
In [3]: f_vec = np.vectorize(f)
In [4]: f_ufunc = np.frompyfunc(f, 2, 1) # 2 inputs, 1 output
In [5]: a = np.random.rand(1000)
In [6]: b = np.random.rand(2000)
In [7]: %timeit np.add.outer(a, b) # a baseline for comparison
100 loops, best of 3: 9.89 ms per loop
In [8]: %timeit f_vec(a[:, None], b) # 50x slower than np.add
1 loops, best of 3: 488 ms per loop
In [9]: %timeit f_ufunc(a[:, None], b) # ~20% faster than np.vectorize...
1 loops, best of 3: 425 ms per loop
In [10]: %timeit f_ufunc.outer(a, b) # ...and you get to use ufunc methods
1 loops, best of 3: 427 ms per loop
因此,尽管它仍然明显不如适当的矢量化实现,但它要快一些(循环在C中进行,但是您仍然需要Python函数调用开销).
So while it is still clearly inferior to a properly vectorized implementation, it is a little faster (the looping is in C, but you still have the Python function call overhead).
这篇关于在两个numpy向量中的成对元素上用函数填充矩阵的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!