计算两个numpy数组之间的相交值的有效方法 [英] Efficient way to compute intersecting values between two numpy arrays
问题描述
我的程序中存在瓶颈,原因是:
I have a bottleneck in my program which is caused by the following:
A = numpy.array([10,4,6,7,1,5,3,4,24,1,1,9,10,10,18])
B = numpy.array([1,4,5,6,7,8,9])
C = numpy.array([i for i in A if i in B])
C
的预期结果如下:
C = [4 6 7 1 5 4 1 1 9]
是否有更有效的方法来执行此操作?
Is there a more efficient way of doing this operation?
请注意,数组A
包含重复值,需要将它们考虑在内.我无法使用设置交集,因为采用交集会忽略重复值,仅返回[1,4,5,6,7,9]
.
Note that array A
contains repeating values and they need to be taken into account. I wasn't able to use set intersection since taking the intersection will omit the repeating values, returning just [1,4,5,6,7,9]
.
还要注意,这只是一个简单的演示.实际的数组大小可能在数千个级别,甚至超过数百万个.
Also note this is only a simple demonstration. The actual array sizes can be in the order of thousands, to well over millions.
推荐答案
您可以使用 np.in1d
:
You can use np.in1d
:
>>> A[np.in1d(A, B)]
array([4, 6, 7, 1, 5, 4, 1, 1, 9])
np.in1d
返回一个布尔数组,该数组指示A
的每个值是否也出现在B
中.然后,该数组可用于索引A
并返回公共值.
np.in1d
returns a boolean array indicating whether each value of A
also appears in B
. This array can then be used to index A
and return the common values.
这与您的示例无关,但值得一提的是,如果A
和B
各自包含唯一值,则可以通过设置assume_unique=True
来加速np.in1d
:
It's not relevant to your example, but it's also worth mentioning that if A
and B
each contain unique values then np.in1d
can be sped up by setting assume_unique=True
:
np.in1d(A, B, assume_unique=True)
您可能还对 np.intersect1d
会返回两个数组共有的唯一值数组(按值排序):
You might also be interested in np.intersect1d
which returns an array of the unique values common to both arrays (sorted by value):
>>> np.intersect1d(A, B)
array([1, 4, 5, 6, 7, 9])
这篇关于计算两个numpy数组之间的相交值的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!