python:检查一个numpy数组是否包含另一个数组的任何元素 [英] python: check if an numpy array contains any element of another array

查看:2988
本文介绍了python:检查一个numpy数组是否包含另一个数组的任何元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查numpy数组是否包含另一个数组的任何元素的最佳方法是什么?

What is the best way to check if an numpy array contains any element of another array?

示例:

array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`

如果array1包含任何array2值,我想获取True,否则为False.

I want to get a True if array1 contains any value of array2, otherwise a False.

推荐答案

使用熊猫,您可以使用isin:

Using Pandas, you can use isin:

a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])

>>> pd.Series(a1).isin(a2).any()
True

并使用 in1d numpy函数(根据@Norman的评论):

And using the in1d numpy function(per the comment from @Norman):

>>> np.any(np.in1d(a1, a2))
True

对于像本例中那样的小型阵列,使用set的解决方案无疑是赢家.对于较大的,不相似的数组(即无重叠),Pandas和Numpy解决方案速度更快.但是, np.intersect1d 似乎擅长使用大型阵列.

For small arrays such as those in this example, the solution using set is the clear winner. For larger, dissimilar arrays (i.e. no overlap), the Pandas and Numpy solutions are faster. However, np.intersect1d appears to excel for larger arrays.

小数组(12-13个元素)

%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.69 µs per loop

%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.88 µs per loop

%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 15.6 µs per loop

%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop

%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop

使用包含10万个元素的数组(无重叠):

a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000

%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop    

%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop

%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop

%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop

%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop

这篇关于python:检查一个numpy数组是否包含另一个数组的任何元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆