查找第一个np.nan值的位置的最有效方法是什么? [英] what is the most efficient way to find the position of the first np.nan value?

查看:141
本文介绍了查找第一个np.nan值的位置的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑数组a

a = np.array([3, 3, np.nan, 3, 3, np.nan])

我能做

np.isnan(a).argmax()

但这需要找到所有np.nan才能找到第一个.
有没有更有效的方法?

But this requires finding all np.nan just to find the first.
Is there a more efficient way?

我一直在尝试找出是否可以将参数传递给np.argpartition,以使np.nan get排在第一位,而不是最后一位.

I've been trying to figure out if I can pass a parameter to np.argpartition such that np.nan get's sorted first as opposed to last.

关于[dup]的编辑.
这个问题不同的原因有很多.

EDIT regarding [dup].
There are several reasons this question is different.

  1. 该问题和答案涉及价值观的平等.这是关于isnan的.
  2. 这些答案都遭受我的答案所面临的同一问题.注意,我提供了一个完全有效的答案,但强调了它的效率低下.我正在寻求解决效率低下的问题.


编辑第二个[dup].


EDIT regarding second [dup].

解决平等问题和答案仍然很古老,很可能已经过时.

Still addressing equality and question/answers are old and very possibly outdated.

推荐答案

我要提名

a.argmax()

使用@fuglede's测试数组:

In [1]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [2]: np.isnan(a).argmax()
Out[2]: 9999
In [3]: np.argmax(a)
Out[3]: 9999
In [4]: a.argmax()
Out[4]: 9999

In [5]: timeit a.argmax()
The slowest run took 29.94 ....
10000 loops, best of 3: 20.3 µs per loop

In [6]: timeit np.isnan(a).argmax()
The slowest run took 7.82 ...
1000 loops, best of 3: 462 µs per loop

我没有安装numba,因此可以进行比较.但是我相对于short的加速比是@fuglede's 6倍.

I don't have numba installed, so can compare that. But my speedup relative to short is greater than @fuglede's 6x.

我正在接受<np.nan的Py3中进行测试,而Py2则发出运行时警告.但是代码搜索表明这并不依赖于该比较.

I'm testing in Py3, which accepts <np.nan, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.

/numpy/core/src/multiarray/calculation.c PyArray_ArgMax用轴播放(将感兴趣的一个移动到最后),并将动作委托给arg_func = PyArray_DESCR(ap)->f->argmax,该函数取决于dtype.

/numpy/core/src/multiarray/calculation.c PyArray_ArgMax plays with axes (moving the one of interest to the end), and delegates the action to arg_func = PyArray_DESCR(ap)->f->argmax, a function that depends on the dtype.

numpy/core/src/multiarray/arraytypes.c.src中,它看起来像BOOL_argmax短路,一旦遇到True,它就会立即返回.

In numpy/core/src/multiarray/arraytypes.c.src it looks like BOOL_argmax short circuits, returning as soon as it encounters a True.

for (; i < n; i++) {
    if (ip[i]) {
        *max_ind = i;
        return 0;
    }
}

@fname@_argmax也会在最大nan上短路.在argmin中,np.nan也是最大".

And @fname@_argmax also short circuits on maximal nan. np.nan is 'maximal' in argmin as well.

#if @isfloat@
    if (@isnan@(mp)) {
        /* nan encountered; it's maximal */
        return 0;
    }
#endif

欢迎来自经验丰富的c编码人员的评论,但在我看来,至少对于np.nan而言,普通的argmax会尽快达到您的要求.

Comments from experienced c coders are welcomed, but it appears to me that at least for np.nan, a plain argmax will be as fast you we can get.

在生成a时使用9999进行显示,表明a.argmax时间取决于该值,与短路一致.

Playing with the 9999 in generating a shows that the a.argmax time depends on that value, consistent with short circuiting.

这篇关于查找第一个np.nan值的位置的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆