计算两个numpy数组之间的相交值的有效方法 [英] Efficient way to compute intersecting values between two numpy arrays

查看:148
本文介绍了计算两个numpy数组之间的相交值的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序中存在瓶颈,原因是:

I have a bottleneck in my program which is caused by the following:

A = numpy.array([10,4,6,7,1,5,3,4,24,1,1,9,10,10,18])
B = numpy.array([1,4,5,6,7,8,9])

C = numpy.array([i for i in A if i in B])

C的预期结果如下:

C = [4 6 7 1 5 4 1 1 9]

是否有更有效的方法来执行此操作?

Is there a more efficient way of doing this operation?

请注意,数组A包含重复值,需要将它们考虑在内.我无法使用设置交集,因为采用交集会忽略重复值,仅返回[1,4,5,6,7,9].

Note that array A contains repeating values and they need to be taken into account. I wasn't able to use set intersection since taking the intersection will omit the repeating values, returning just [1,4,5,6,7,9].

还要注意,这只是一个简单的演示.实际的数组大小可能在数千个级别,甚至超过数百万个.

Also note this is only a simple demonstration. The actual array sizes can be in the order of thousands, to well over millions.

推荐答案

您可以使用 np.in1d :

You can use np.in1d:

>>> A[np.in1d(A, B)]
array([4, 6, 7, 1, 5, 4, 1, 1, 9])

np.in1d返回一个布尔数组,该数组指示A的每个值是否也出现在B中.然后,该数组可用于索引A并返回公共值.

np.in1d returns a boolean array indicating whether each value of A also appears in B. This array can then be used to index A and return the common values.

这与您的示例无关,但值得一提的是,如果AB各自包含唯一值,则可以通过设置assume_unique=True来加速np.in1d:

It's not relevant to your example, but it's also worth mentioning that if A and B each contain unique values then np.in1d can be sped up by setting assume_unique=True:

np.in1d(A, B, assume_unique=True)

您可能还对 np.intersect1d 会返回两个数组共有的唯一值数组(按值排序):

You might also be interested in np.intersect1d which returns an array of the unique values common to both arrays (sorted by value):

>>> np.intersect1d(A, B)
array([1, 4, 5, 6, 7, 9])

这篇关于计算两个numpy数组之间的相交值的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆