测试Numpy数组是否包含给定的行 [英] testing whether a Numpy array contains a given row

查看:70
本文介绍了测试Numpy数组是否包含给定的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在一种Pythonic高效的方法来检查Numpy数组是否包含给定行的至少一个实例? 有效"是指它在找到第一个匹配行时终止,而不是遍历整个数组,即使已经找到结果也是如此.

Is there a Pythonic and efficient way to check whether a Numpy array contains at least one instance of a given row? By "efficient" I mean it terminates upon finding the first matching row rather than iterating over the entire array even if a result has already been found.

使用Python数组,可以使用if row in array:非常干净地完成此操作,但这不符合我对Numpy数组的预期,如下所示.

With Python arrays this can be accomplished very cleanly with if row in array:, but this does not work as I would expect for Numpy arrays, as illustrated below.

使用Python数组:

With Python arrays:

>>> a = [[1,2],[10,20],[100,200]]
>>> [1,2] in a
True
>>> [1,20] in a
False

但是Numpy数组会给出不同且看起来很奇怪的结果. (ndarray__contains__方法似乎没有记录.)

but Numpy arrays give different and rather odd-looking results. (The __contains__ method of ndarray seems to be undocumented.)

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> np.array([1,2]) in a
True
>>> np.array([1,20]) in a
True
>>> np.array([1,42]) in a
True
>>> np.array([42,1]) in a
False

推荐答案

Numpys __contains__

Numpys __contains__ is, at the time of writing this, (a == b).any() which is arguably only correct if b is a scalar (it is a bit hairy, but I believe – works like this only in 1.7. or later – this would be the right general method (a == b).all(np.arange(a.ndim - b.ndim, a.ndim)).any(), which makes sense for all combinations of a and b dimensionality)...

需要明确的是,当涉及广播时,这不一定是预期的结果.也可能有人认为它应该像np.in1d一样单独处理a中的项目.我不确定是否应该有一种明确的方法.

Just to be clear, this is not necessarily the expected result when broadcasting is involved. Also someone might argue that it should handle the items in a separately as np.in1d does. I am not sure there is one clear way it should work.

现在,您希望numpy在找到第一个匹配项时停止.该AFAIK目前不存在.这很困难,因为numpy主要基于ufunc,它们在整个数组上执行相同的操作. Numpy确实优化了这类减少,但是只有在要减少的数组已经是布尔型数组(即np.ones(10, dtype=bool).any())时有效,Numpy才有效.

Now you want numpy to stop when it finds the first occurrence. This AFAIK does not exist at this time. It is difficult because numpy is based mostly on ufuncs, which do the same thing over the whole array. Numpy does optimize these kind of reductions, but effectively that only works when the array being reduced is already a boolean array (i.e. np.ones(10, dtype=bool).any()).

否则,它将需要不存在的__contains__特殊功能.这似乎很奇怪,但是您必须记住numpy支持许多数据类型,并且具有更大的机制来选择正确的数据类型并选择正确的函数来对其进行处理.因此,换句话说,ufunc机制无法做到这一点,并且由于数据类型的原因,实现__contains__或类似的实现实际上并不是那么简单.

Otherwise it would need a special function for __contains__ which does not exist. That may seem odd, but you have to remember that numpy supports many data types and has a bigger machinery to select the correct ones and select the correct function to work on it. So in other words, the ufunc machinery cannot do it, and implementing __contains__ or such specially is not actually that trivial because of data types.

您当然可以用python编写,或者因为您可能知道数据类型,所以用Cython/C自己编写非常简单.

You can of course write it in python, or since you probably know your data type, writing it yourself in Cython/C is very simple.

那是.无论如何,对这些事情使用基于排序的方法通常要好得多.这有点乏味,而且对于lexsort,没有searchsorted这样的东西,但是它可以工作(如果愿意,您也可以滥用scipy.spatial.cKDTree).假设您只想沿最后一个轴进行比较:

That said. Often it is much better anyway to use sorting based approach for these things. That is a little tedious as well as there is no such thing as searchsorted for a lexsort, but it works (you could also abuse scipy.spatial.cKDTree if you like). This assumes you want to compare along the last axis only:

# Unfortunatly you need to use structured arrays:
sorted = np.ascontiguousarray(a).view([('', a.dtype)] * a.shape[-1]).ravel()

# Actually at this point, you can also use np.in1d, if you already have many b
# then that is even better.

sorted.sort()

b_comp = np.ascontiguousarray(b).view(sorted.dtype)
ind = sorted.searchsorted(b_comp)

result = sorted[ind] == b_comp

这也适用于数组b,如果保留排序后的数组,则当a停留时一次对b中的单个值(行)进行处理,也会更好.相同(否则,在将其视为Recarray之后,我只是np.in1d). 重要:为了安全起见,您必须执行np.ascontiguousarray.它通常什么也不做,但是如果这样做,否则将是一个很大的潜在错误.

This works also for an array b, and if you keep the sorted array around, is also much better if you do it for a single value (row) in b at a time, when a stays the same (otherwise I would just np.in1d after viewing it as a recarray). Important: you must do the np.ascontiguousarray for safety. It will typically do nothing, but if it does, it would be a big potential bug otherwise.

这篇关于测试Numpy数组是否包含给定的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆