查找第一个np.nan值的位置的最有效方法是什么? [英] what is the most efficient way to find the position of the first np.nan value?
问题描述
考虑数组a
a = np.array([3, 3, np.nan, 3, 3, np.nan])
我能做
np.isnan(a).argmax()
但这需要找到所有np.nan
才能找到第一个.
有没有更有效的方法?
But this requires finding all np.nan
just to find the first.
Is there a more efficient way?
我一直在尝试找出是否可以将参数传递给np.argpartition
,以使np.nan
get排在第一位,而不是最后一位.
I've been trying to figure out if I can pass a parameter to np.argpartition
such that np.nan
get's sorted first as opposed to last.
关于[dup]的编辑.
这个问题不同的原因有很多.
EDIT regarding [dup].
There are several reasons this question is different.
- 该问题和答案涉及价值观的平等.这是关于
isnan
的. - 这些答案都遭受我的答案所面临的同一问题.注意,我提供了一个完全有效的答案,但强调了它的效率低下.我正在寻求解决效率低下的问题.
编辑第二个[dup].
EDIT regarding second [dup].
解决平等问题和答案仍然很古老,很可能已经过时.
Still addressing equality and question/answers are old and very possibly outdated.
推荐答案
我要提名
a.argmax()
使用@fuglede's
测试数组:
In [1]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [2]: np.isnan(a).argmax()
Out[2]: 9999
In [3]: np.argmax(a)
Out[3]: 9999
In [4]: a.argmax()
Out[4]: 9999
In [5]: timeit a.argmax()
The slowest run took 29.94 ....
10000 loops, best of 3: 20.3 µs per loop
In [6]: timeit np.isnan(a).argmax()
The slowest run took 7.82 ...
1000 loops, best of 3: 462 µs per loop
我没有安装numba
,因此可以进行比较.但是我相对于short
的加速比是@fuglede's
6倍.
I don't have numba
installed, so can compare that. But my speedup relative to short
is greater than @fuglede's
6x.
我正在接受<np.nan
的Py3中进行测试,而Py2则发出运行时警告.但是代码搜索表明这并不依赖于该比较.
I'm testing in Py3, which accepts <np.nan
, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.
/numpy/core/src/multiarray/calculation.c
PyArray_ArgMax
用轴播放(将感兴趣的一个移动到最后),并将动作委托给arg_func = PyArray_DESCR(ap)->f->argmax
,该函数取决于dtype.
/numpy/core/src/multiarray/calculation.c
PyArray_ArgMax
plays with axes (moving the one of interest to the end), and delegates the action to arg_func = PyArray_DESCR(ap)->f->argmax
, a function that depends on the dtype.
在numpy/core/src/multiarray/arraytypes.c.src
中,它看起来像BOOL_argmax
短路,一旦遇到True
,它就会立即返回.
In numpy/core/src/multiarray/arraytypes.c.src
it looks like BOOL_argmax
short circuits, returning as soon as it encounters a True
.
for (; i < n; i++) {
if (ip[i]) {
*max_ind = i;
return 0;
}
}
和@fname@_argmax
也会在最大nan
上短路.在argmin
中,np.nan
也是最大".
And @fname@_argmax
also short circuits on maximal nan
. np.nan
is 'maximal' in argmin
as well.
#if @isfloat@
if (@isnan@(mp)) {
/* nan encountered; it's maximal */
return 0;
}
#endif
欢迎来自经验丰富的c
编码人员的评论,但在我看来,至少对于np.nan
而言,普通的argmax
会尽快达到您的要求.
Comments from experienced c
coders are welcomed, but it appears to me that at least for np.nan
, a plain argmax
will be as fast you we can get.
在生成a
时使用9999
进行显示,表明a.argmax
时间取决于该值,与短路一致.
Playing with the 9999
in generating a
shows that the a.argmax
time depends on that value, consistent with short circuiting.
这篇关于查找第一个np.nan值的位置的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!