检查两个numpy数组是否相同 [英] Check if two numpy arrays are identical

查看:478
本文介绍了检查两个numpy数组是否相同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一堆数组,包括xy,并且我想检查它们是否相等.通常,我只能使用np.all(x == y)(除非我现在忽略了一些笨拙的案例).

Suppose I have a bunch of arrays, including x and y, and I want to check if they're equal. Generally, I can just use np.all(x == y) (barring some dumb corner cases which I'm ignoring now).

但是,这会评估(x == y) entire 数组,通常不需要此数组.我的数组真的很大,并且我有很多,而且两个数组相等的可能性很小,因此,很可能我真的只需要在all函数之前评估(x == y)的一小部分可以返回False,所以这对我来说不是最佳解决方案.

However this evaluates the entire array of (x == y), which is usually not needed. My arrays are really large, and I have a lot of them, and the probability of two arrays being equal is small, so in all likelihood, I really only need to evaluate a very small portion of (x == y) before the all function could return False, so this is not an optimal solution for me.

我尝试将内置的all函数与itertools.izip结合使用:all(val1==val2 for val1,val2 in itertools.izip(x, y))

I've tried using the builtin all function, in combination with itertools.izip: all(val1==val2 for val1,val2 in itertools.izip(x, y))

但是,在两个数组 相等的情况下,这似乎要慢得多,总的来说,不值得在np.all以上使用.我猜想是因为内置all具有通用性.而且np.all在生成器上不起作用.

However, that just seems much slower in the case that two arrays are equal, that overall, it's stil not worth using over np.all. I presume because of the builtin all's general-purposeness. And np.all doesn't work on generators.

有没有一种方法可以更快地完成我想做的事情?

Is there a way to do what I want in a more speedy manner?

我知道这个问题与之前提出的问题类似(例如,比较相等的两个numpy数组,逐个元素化),但它们具体不包括提前终止的情况.

I know this question is similar to previously asked questions (e.g. Comparing two numpy arrays for equality, element-wise) but they specifically don't cover the case of early termination.

推荐答案

在以numpy原生实现之前,您可以编写自己的函数并使用

Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:

import numpy as np
import numba as nb


@nb.jit(nopython=True)
def arrays_equal(a, b):
    if a.shape != b.shape:
        return False
    for ai, bi in zip(a.flat, b.flat):
        if ai != bi:
            return False
    return True


a = np.random.rand(10, 20, 30)
b = np.random.rand(10, 20, 30)


%timeit np.all(a==b)  # 100000 loops, best of 3: 9.82 µs per loop
%timeit arrays_equal(a, a)  # 100000 loops, best of 3: 9.89 µs per loop
%timeit arrays_equal(a, b)  # 100000 loops, best of 3: 691 ns per loop

最坏情况下的性能(等于数组)与np.all等效,并且在尽早停止编译功能的情况下,其性能可能会大大优于np.all.

Worst case performance (arrays equal) is equivalent to np.all and in case of early stopping the compiled function has the potential to outperform np.all a lot.

这篇关于检查两个numpy数组是否相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆