快速计算此numpy查询的方法 [英] Fast way to compute this numpy query
问题描述
我有一个长度为n
的布尔值numpy
数组mask
.我还有一个长度为< = n
的numpy
数组a
,包含从0
(包括)到n-1
(包括)的数字,并且不包含重复项.我要计算的查询是np.array([x for x in a if mask[x]])
,但我认为这不是最快的方法.
I have a boolean numpy
array mask
of length n
. I also have a numpy
array a
of length <= n
, containing numbers ranging from 0
(inclusive) to n-1
(inclusive), and it contains no duplicates. The query I want to compute is np.array([x for x in a if mask[x]])
, but I don't think it's the fastest way to do it.
在numpy
中,有没有比我刚才写的方法更快的方法?
Is there a faster way of doing this in numpy
than the way I just wrote?
推荐答案
最快的方法似乎就是a[mask[a]]
.我写了一个快速测试,显示了两种方法在速度上的差异,这取决于蒙版的覆盖率p(真实项的数量/n).
It looks like the fastest way to do this is simply a[mask[a]]
. I wrote a quick test which shows the difference in speed of the two methods depending on the coverage of the mask, p (the number of true items / n).
import timeit
import matplotlib.pyplot as plt
import numpy as np
n = 10000
p = 0.25
slow_times = []
fast_times = []
p_space = np.linspace(0, 1, 100)
for p in p_space:
mask = np.random.choice([True, False], n, p=[p, 1 - p])
a = np.arange(n)
np.random.shuffle(a)
y = np.array([x for x in a if mask[x]])
z = a[mask[a]]
n_test = 100
t1 = timeit.timeit(lambda: np.array([x for x in a if mask[x]]), number=n_test)
t2 = timeit.timeit(lambda: a[mask[a]], number=n_test)
slow_times.append(t1)
fast_times.append(t2)
plt.plot(p_space, slow_times, label='slow')
plt.plot(p_space, fast_times, label='fast')
plt.xlabel('p (# true items in mask)')
plt.ylabel('time (ms)')
plt.legend()
plt.title('Speed of method vs. coverage of mask')
plt.show()
哪个给了我这个情节
因此,无论蒙版的覆盖范围如何,此方法都快得多.
So this method is a whole lot faster regardless of the coverage of mask.
这篇关于快速计算此numpy查询的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!