检查两个 3D numpy 数组是否包含重叠的 2D 数组 [英] Check if two 3D numpy arrays contain overlapping 2D arrays

查看:19
本文介绍了检查两个 3D numpy 数组是否包含重叠的 2D 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个非常大的 numpy 数组,它们都是 3D 的.我需要找到一种有效的方法来检查它们是否重叠,因为首先将它们变成集合需要很长时间.我尝试使用我在这里找到的另一个解决方案来解决同样的问题,但用于 2D 数组,但我没有设法使其适用于 3D.这是 2D 的解决方案:

nrows, ncols = A.shapedtype={'names':['f{}'.format(i) for i in range(ndep)],'格式':ndep * [A.dtype]}C = np.intersect1d(A.view(dtype).view(dtype), B.view(dtype).view(dtype))# 如果C"是结构化数组,那么最后一点是可选的...C = C.view(A.dtype).reshape(-1, ndep)

(其中 A 和 B 是二维数组)我需要找到重叠的 numpy 数组的数量,但不是特定的.

解决方案

我们可以使用我在几个问答中使用过的辅助函数来利用 views.为了获得子数组的存在,我们可以在视图上使用 np.isin 或者使用更费力的 np.searchsorted.

方法#1:使用np.isin -

# https://stackoverflow.com/a/45313353/@Divakardef view1D(a, b): # a, b 是数组a = np.ascontiguousarray(a)b = np.ascontiguousarray(b)void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))返回 a.view(void_dt).ravel(), b.view(void_dt).ravel()def isin_nd(a,b):# a,b 是 3D 输入数组,为我们提供类似 isin"的功能A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))返回 np.isin(A,B)

方法#2:我们还可以在 views 上利用 np.searchsorted -

def isin_nd_searchsorted(a,b):# a,b 是 3D 输入数组A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))sidx = A.argsort()sorted_index = np.searchsorted(A,B,sorter=sidx)sorted_index[sorted_index==len(A)] = len(A)-1idx = sidx[sorted_index]返回 A[idx] == B

因此,这两个解决方案为我们提供了 b 中来自 a 的每个子数组的存在掩码.因此,要获得我们想要的计数,应该是 - isin_nd(a,b).sum()isin_nd_searchsorted(a,b).sum().>

样品运行 -

In [71]: # 设置有 3 个常见的子阵列"...:np.random.seed(0)...: a = np.random.randint(0,9,(10,4,5))...: b = np.random.randint(0,9,(7,4,5))...:...: b[1] = a[4]...: b[3] = a[2]...: b[6] = a[0]在 [72] 中:isin_nd(a,b).sum()出[72]:3在 [73] 中:isin_nd_searchsorted(a,b).sum()出[73]:3

大型数组的计时 -

在[74]中:#设置...:np.random.seed(0)...: a = np.random.randint(0,9,(100,100,100))...: b = np.random.randint(0,9,(100,100,100))...: idxa = np.random.choice(range(len(a)), len(a)//2, replace=False)...: idxb = np.random.choice(range(len(b)), len(b)//2, replace=False)...: a[idxa] = b[idxb]# 验证输出在 [82]: np.allclose(isin_nd(a,b),isin_nd_searchsorted(a,b))输出[82]:真在 [75]: %timeit isin_nd(a,b).sum()10 个循环,最好的 3 个:每个循环 31.2 毫秒在 [76]: %timeit isin_nd_searchsorted(a,b).sum()100 个循环,最好的 3 个:每个循环 1.98 毫秒

I have two very large numpy arrays, which are both 3D. I need to find an efficient way to check if they are overlapping, because turning them both into sets first takes too long. I tried to use another solution I found here for this same problem but for 2D arrays, but I didn't manage to make it work for 3D. Here is the solution for 2D:

nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ndep)],
       'formats':ndep * [A.dtype]}
C = np.intersect1d(A.view(dtype).view(dtype), B.view(dtype).view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ndep)

(where A and B are the 2D arrays) I need to find the number of overlapping numpy arrays, but not the specific ones.

解决方案

We could leverage views using a helper function that I have used across few Q&As. To get the presence of subarrays, we could use np.isin on the views or use a more laborious one with np.searchsorted.

Approach #1 : Using np.isin -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def isin_nd(a,b):
    # a,b are the 3D input arrays to give us "isin-like" functionality across them
    A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
    return np.isin(A,B)

Approach #2 : We could also leverage np.searchsorted upon the views -

def isin_nd_searchsorted(a,b):
    # a,b are the 3D input arrays
    A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
    sidx = A.argsort()
    sorted_index = np.searchsorted(A,B,sorter=sidx)
    sorted_index[sorted_index==len(A)] = len(A)-1
    idx = sidx[sorted_index]
    return A[idx] == B

So, these two solutions give us the mask of presence of each of the subarrays from a in b. Hence, to get our desired count, it would be - isin_nd(a,b).sum() or isin_nd_searchsorted(a,b).sum().

Sample run -

In [71]: # Setup with 3 common "subarrays"
    ...: np.random.seed(0)
    ...: a = np.random.randint(0,9,(10,4,5))
    ...: b = np.random.randint(0,9,(7,4,5))
    ...: 
    ...: b[1] = a[4]
    ...: b[3] = a[2]
    ...: b[6] = a[0]

In [72]: isin_nd(a,b).sum()
Out[72]: 3

In [73]: isin_nd_searchsorted(a,b).sum()
Out[73]: 3

Timings on large arrays -

In [74]: # Setup
    ...: np.random.seed(0)
    ...: a = np.random.randint(0,9,(100,100,100))
    ...: b = np.random.randint(0,9,(100,100,100))
    ...: idxa = np.random.choice(range(len(a)), len(a)//2, replace=False)
    ...: idxb = np.random.choice(range(len(b)), len(b)//2, replace=False)
    ...: a[idxa] = b[idxb]

# Verify output
In [82]: np.allclose(isin_nd(a,b),isin_nd_searchsorted(a,b))
Out[82]: True

In [75]: %timeit isin_nd(a,b).sum()
10 loops, best of 3: 31.2 ms per loop

In [76]: %timeit isin_nd_searchsorted(a,b).sum()
100 loops, best of 3: 1.98 ms per loop

这篇关于检查两个 3D numpy 数组是否包含重叠的 2D 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆