如果存在于另一个数组中,则从一个数组中删除元素,并保持重复-NumPy/Python [英] Remove elements from one array if present in another array, keep duplicates - NumPy / Python
问题描述
我有两个数组A
(380万的镜头)和B
(20k的镜头).
对于最小的示例,让我们来考虑这种情况:
I have two arrays A
(len of 3.8million) and B
(len of 20k).
For the minimal example, lets take this case:
A = np.array([1,1,2,3,3,3,4,5,6,7,8,8])
B = np.array([1,2,8])
现在我希望结果数组为:
Now I want the resulting array to be:
C = np.array([3,3,3,4,5,6,7])
即如果在A
中找到B
中的任何值,请将其从A
中删除,如果不保留它.
i.e. if any value in B
is found in A
, remove it from A
, if not keep it.
我想知道是否有没有for
循环的任何方法,因为它是一个冗长的数组,因此循环需要很长时间.
I would like to know if there is any way to do it without a for
loop because it is a lengthy array and so it takes long time to loop.
推荐答案
使用searchsorted
Using searchsorted
With sorted B
, we can use searchsorted
-
A[B[np.searchsorted(B,A)] != A]
从链接的文档中,searchsorted(a,v)
将索引查找到排序后的数组a
中,这样,如果v
中的相应元素插入到索引之前,则将保留a的顺序.因此,假设idx = searchsorted(B,A)
并使用B[idx]
索引到B
中,我们将获得与A
中每个元素相对应的B
映射版本.因此,将此映射版本与A
进行比较会告诉我们A
中的每个元素,如果B
中是否存在匹配项.最后,索引到A
以选择不匹配的内容.
From the linked docs, searchsorted(a,v)
find the indices into a sorted array a
such that, if the corresponding elements in v
were inserted before the indices, the order of a would be preserved. So, let's say idx = searchsorted(B,A)
and we index into B
with those : B[idx]
, we will get a mapped version of B
corresponding to every element in A
. Thus, comparing this mapped version against A
would tell us for every element in A
if there's a match in B
or not. Finally, index into A
to select the non-matching ones.
一般情况(B
未排序):
Generic case (B
is not sorted) :
如果B
尚未按照先决条件进行排序,则对其进行排序,然后使用建议的方法.
If B
is not already sorted as is the pre-requisite, sort it and then use the proposed method.
或者,我们可以将sorter
参数与searchsorted
-
Alternatively, we can use sorter
argument with searchsorted
-
sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]
更常见的情况(A
的值大于B
中的值):
More generic case (A
has values higher than ones in B
) :
sidx = B.argsort()
idx = np.searchsorted(B,A,sorter=sidx)
idx[idx==len(B)] = 0
out = A[B[sidx[idx]] != A]
使用in1d/isin
我们还可以使用 np.in1d
,这很简单(链接的文档应该帮助澄清),因为它在A
中的每个元素中查找B
中的任何匹配项,然后我们可以使用带有倒置掩码的布尔索引来查找不匹配项一个-
Using in1d/isin
We can also use np.in1d
, which is pretty straight-forward (the linked docs should help clarify) as it looks for any match in B
for every element in A
and then we can use boolean-indexing with an inverted mask to look for non-matching ones -
A[~np.in1d(A,B)]
与isin
相同-
A[~np.isin(A,B)]
带有invert
标志-
A[np.in1d(A,B,invert=True)]
A[np.isin(A,B,invert=True)]
这解决了B
不一定要排序时的泛型问题.
This solves for a generic when B
is not necessarily sorted.
这篇关于如果存在于另一个数组中,则从一个数组中删除元素,并保持重复-NumPy/Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!