如果存在于另一个数组中,则从一个数组中删除元素,并保持重复-NumPy/Python [英] Remove elements from one array if present in another array, keep duplicates - NumPy / Python

查看:198
本文介绍了如果存在于另一个数组中,则从一个数组中删除元素,并保持重复-NumPy/Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数组A(380万的镜头)和B(20k的镜头). 对于最小的示例,让我们来考虑这种情况:

I have two arrays A (len of 3.8million) and B (len of 20k). For the minimal example, lets take this case:

A = np.array([1,1,2,3,3,3,4,5,6,7,8,8])
B = np.array([1,2,8])

现在我希望结果数组为:

Now I want the resulting array to be:

C = np.array([3,3,3,4,5,6,7])

即如果在A中找到B中的任何值,请将其从A中删除,如果不保留它.

i.e. if any value in B is found in A, remove it from A, if not keep it.

我想知道是否有没有for循环的任何方法,因为它是一个冗长的数组,因此循环需要很长时间.

I would like to know if there is any way to do it without a for loop because it is a lengthy array and so it takes long time to loop.

推荐答案

使用searchsorted

通过排序B,我们可以使用 -

Using searchsorted

With sorted B, we can use searchsorted -

A[B[np.searchsorted(B,A)] !=  A]

从链接的文档中,searchsorted(a,v)将索引查找到排序后的数组a中,这样,如果v中的相应元素插入到索引之前,则将保留a的顺序.因此,假设idx = searchsorted(B,A)并使用B[idx]索引到B中,我们将获得与A中每个元素相对应的B映射版本.因此,将此映射版本与A进行比较会告诉我们A中的每个元素,如果B中是否存在匹配项.最后,索引到A以选择不匹配的内容.

From the linked docs, searchsorted(a,v) find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved. So, let's say idx = searchsorted(B,A) and we index into B with those : B[idx], we will get a mapped version of B corresponding to every element in A. Thus, comparing this mapped version against A would tell us for every element in A if there's a match in B or not. Finally, index into A to select the non-matching ones.

一般情况(B未排序):

Generic case (B is not sorted) :

如果B尚未按照先决条件进行排序,则对其进行排序,然后使用建议的方法.

If B is not already sorted as is the pre-requisite, sort it and then use the proposed method.

或者,我们可以将sorter参数与searchsorted-

Alternatively, we can use sorter argument with searchsorted -

sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]

更常见的情况(A的值大于B中的值):

More generic case (A has values higher than ones in B) :

sidx = B.argsort()
idx = np.searchsorted(B,A,sorter=sidx)
idx[idx==len(B)] = 0
out = A[B[sidx[idx]] != A]


使用in1d/isin

我们还可以使用 np.in1d ,这很简单(链接的文档应该帮助澄清),因为它在A中的每个元素中查找B中的任何匹配项,然后我们可以使用带有倒置掩码的布尔索引来查找不匹配项一个-


Using in1d/isin

We can also use np.in1d, which is pretty straight-forward (the linked docs should help clarify) as it looks for any match in B for every element in A and then we can use boolean-indexing with an inverted mask to look for non-matching ones -

A[~np.in1d(A,B)]

isin相同-

A[~np.isin(A,B)]

带有invert标志-

A[np.in1d(A,B,invert=True)]

A[np.isin(A,B,invert=True)]

这解决了B不一定要排序时的泛型问题.

This solves for a generic when B is not necessarily sorted.

这篇关于如果存在于另一个数组中,则从一个数组中删除元素,并保持重复-NumPy/Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆