快速检查NumPy中的NaN [英] Fast check for NaN in NumPy

查看:288
本文介绍了快速检查NumPy中的NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找最快的方法来检查NumPy数组X中NaN(np.nan)的出现. np.isnan(X)是不可能的,因为它会建立一个形状为X.shape的布尔数组,该数组可能是巨大的.

I'm looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since it builds a boolean array of shape X.shape, which is potentially gigantic.

我尝试了np.nan in X,但这似乎不起作用,因为np.nan != np.nan.有没有一种快速且节省内存的方法来做到这一点?

I tried np.nan in X, but that seems not to work because np.nan != np.nan. Is there a fast and memory-efficient way to do this at all?

(对于那些问多么巨大"的人:我不知道.这是库代码的输入验证.)

(To those who would ask "how gigantic": I can't tell. This is input validation for library code.)

推荐答案

Ray的解决方案很好.但是,在我的机器上,使用 代替numpy.min:

Ray's solution is good. However, on my machine it is about 2.5x faster to use numpy.sum in place of numpy.min:

In [13]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 244 us per loop

In [14]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 97.3 us per loop

min不同,sum不需要分支,这在现代硬件上往往非常昂贵.这可能是sum速度更快的原因.

Unlike min, sum doesn't require branching, which on modern hardware tends to be pretty expensive. This is probably the reason why sum is faster.

编辑以上测试是在阵列中间的单个NaN上进行的.

edit The above test was performed with a single NaN right in the middle of the array.

有趣的是,min在存在NaN的情况下比在不存在NaN的情况下要慢.随着NaN越来越接近数组的开始,它似乎也变得越来越慢.另一方面,无论是否存在NaN及其位于何处,sum的吞吐量似乎都是恒定的:

It is interesting to note that min is slower in the presence of NaNs than in their absence. It also seems to get slower as NaNs get closer to the start of the array. On the other hand, sum's throughput seems constant regardless of whether there are NaNs and where they're located:

In [40]: x = np.random.rand(100000)

In [41]: %timeit np.isnan(np.min(x))
10000 loops, best of 3: 153 us per loop

In [42]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop

In [43]: x[50000] = np.nan

In [44]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 239 us per loop

In [45]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.8 us per loop

In [46]: x[0] = np.nan

In [47]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 326 us per loop

In [48]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop

这篇关于快速检查NumPy中的NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆