检查两个numpy数组是否相同 [英] Check if two numpy arrays are identical
问题描述
假设我有一堆数组,包括x
和y
,并且我想检查它们是否相等.通常,我只能使用np.all(x == y)
(除非我现在忽略了一些笨拙的案例).
Suppose I have a bunch of arrays, including x
and y
, and I want to check if they're equal. Generally, I can just use np.all(x == y)
(barring some dumb corner cases which I'm ignoring now).
但是,这会评估(x == y)
的 entire 数组,通常不需要此数组.我的数组真的很大,并且我有很多,而且两个数组相等的可能性很小,因此,很可能我真的只需要在all
函数之前评估(x == y)
的一小部分可以返回False,所以这对我来说不是最佳解决方案.
However this evaluates the entire array of (x == y)
, which is usually not needed. My arrays are really large, and I have a lot of them, and the probability of two arrays being equal is small, so in all likelihood, I really only need to evaluate a very small portion of (x == y)
before the all
function could return False, so this is not an optimal solution for me.
我尝试将内置的all
函数与itertools.izip
结合使用:all(val1==val2 for val1,val2 in itertools.izip(x, y))
I've tried using the builtin all
function, in combination with itertools.izip
: all(val1==val2 for val1,val2 in itertools.izip(x, y))
但是,在两个数组 相等的情况下,这似乎要慢得多,总的来说,不值得在np.all
以上使用.我猜想是因为内置all
具有通用性.而且np.all
在生成器上不起作用.
However, that just seems much slower in the case that two arrays are equal, that overall, it's stil not worth using over np.all
. I presume because of the builtin all
's general-purposeness. And np.all
doesn't work on generators.
有没有一种方法可以更快地完成我想做的事情?
Is there a way to do what I want in a more speedy manner?
我知道这个问题与之前提出的问题类似(例如,比较相等的两个numpy数组,逐个元素化),但它们具体不包括提前终止的情况.
I know this question is similar to previously asked questions (e.g. Comparing two numpy arrays for equality, element-wise) but they specifically don't cover the case of early termination.
推荐答案
Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:
import numpy as np
import numba as nb
@nb.jit(nopython=True)
def arrays_equal(a, b):
if a.shape != b.shape:
return False
for ai, bi in zip(a.flat, b.flat):
if ai != bi:
return False
return True
a = np.random.rand(10, 20, 30)
b = np.random.rand(10, 20, 30)
%timeit np.all(a==b) # 100000 loops, best of 3: 9.82 µs per loop
%timeit arrays_equal(a, a) # 100000 loops, best of 3: 9.89 µs per loop
%timeit arrays_equal(a, b) # 100000 loops, best of 3: 691 ns per loop
最坏情况下的性能(等于数组)与np.all
等效,并且在尽早停止编译功能的情况下,其性能可能会大大优于np.all
.
Worst case performance (arrays equal) is equivalent to np.all
and in case of early stopping the compiled function has the potential to outperform np.all
a lot.
这篇关于检查两个numpy数组是否相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!