从一个数组中删除另一个数组中的元素 [英] Removing elements from an array that are in another array

查看:68
本文介绍了从一个数组中删除另一个数组中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这些二维数组 A 和 B.

Say I have these 2D arrays A and B.

如何从 A 中删除 B 中的元素.(集合论中的补:A-B)

How can I remove elements from A that are in B. (Complement in set theory: A-B)

A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]])
B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]])
#output = [[1,1,2], [1,1,3]]

<小时>

更准确地说,我想做这样的事情.


To be more precise, I would like to do something like this.

data = some numpy array
label = some numpy array
A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]]
B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]]
out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]]

推荐答案

基于这个解决方案查找行索引numpy 数组中的几个值,这是一个基于 NumPy 的解决方案,内存占用更少,在处理大型数组时可能会很有用 -

Based on this solution to Find the row indexes of several values in a numpy array, here's a NumPy based solution with less memory footprint and could be beneficial when working with large arrays -

dims = np.maximum(B.max(0),A.max(0))+1
out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]

样品运行 -

In [38]: A
Out[38]: 
array([[1, 1, 1],
       [1, 1, 2],
       [1, 1, 3],
       [1, 1, 4]])

In [39]: B
Out[39]: 
array([[0, 0, 0],
       [1, 0, 2],
       [1, 0, 3],
       [1, 0, 4],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 4]])

In [40]: out
Out[40]: 
array([[1, 1, 2],
       [1, 1, 3]])

大型阵列的运行时测试 -

Runtime test on large arrays -

In [107]: def in1d_approach(A,B):
     ...:     dims = np.maximum(B.max(0),A.max(0))+1
     ...:     return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
     ...:                     np.ravel_multi_index(B.T,dims))]
     ...: 

In [108]: # Setup arrays with B as large array and A contains some of B's rows
     ...: B = np.random.randint(0,9,(1000,3))
     ...: A = np.random.randint(0,9,(100,3))
     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
     ...: A[A_idx] = B[B_idx]
     ...: 

使用基于 broadcasting 的解决方案的时间 -

Timings with broadcasting based solutions -

In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100 loops, best of 3: 4.64 ms per loop # @Kasramvd's soln

In [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100 loops, best of 3: 3.66 ms per loop

基于内存占用更少的解决方案计时 -

Timing with less memory footprint based solution -

In [111]: %timeit in1d_approach(A,B)
1000 loops, best of 3: 231 µs per loop

进一步提升性能

in1d_approach 通过将每一行视为一个索引元组来减少每一行.我们可以通过使用 np.dot 引入矩阵乘法来更有效地做同样的事情,就像这样 -

in1d_approach reduces each row by considering each row as an indexing tuple. We can do the same a bit more efficiently by introducing matrix-multiplication with np.dot, like so -

def in1d_dot_approach(A,B):
    cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])
    return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]

让我们在更大的数组上测试它 -

Let's test it against the previous on much larger arrays -

In [251]: # Setup arrays with B as large array and A contains some of B's rows
     ...: B = np.random.randint(0,9,(10000,3))
     ...: A = np.random.randint(0,9,(1000,3))
     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
     ...: A[A_idx] = B[B_idx]
     ...: 

In [252]: %timeit in1d_approach(A,B)
1000 loops, best of 3: 1.28 ms per loop

In [253]: %timeit in1d_dot_approach(A, B)
1000 loops, best of 3: 1.2 ms per loop

这篇关于从一个数组中删除另一个数组中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆