匹配两个2D数组的行并使用numpy获取行索引图 [英] Match rows of two 2D arrays and get a row indices map using numpy

查看:87
本文介绍了匹配两个2D数组的行并使用numpy获取行索引图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有两个2D数组A和B,并且要检查B中包含A行的位置. 如何使用numpy最有效地做到这一点?

Suppose you have two 2D arrays A and B, and you want to check, where a row of A is contained in B. How do you do this most efficiently using numpy?

例如

a = np.array([[1,2,3],
              [4,5,6],
              [9,10,11]])

b = np.array([[4,5,6],
              [4,3,2],
              [1,2,3],
              [4,8,9]])
map = [[0,2], [1,0]]  # row 0 of a is at row index 2 of array B

我知道如何使用in1d(

I know how to check if a row of A is in B using in1d (test for membership in a 2d numpy array), but this does not yield the indices map.

此映射的目的是(最终)基于某些列将两个数组合并在一起.
当然,可以逐行执行此操作,但这效率非常低,因为我的数组具有形状(50 Mio.,20).

The purpose of this map is to (finally) merge the two arrays together based on some columns.
Of course one could do this row by row, but this gets very inefficient, since my arrays have the shape (50 Mio., 20).

一种替代方法是使用熊猫合并功能,但是我只想使用numpy来实现.

An alternative would be to use the pandas merge function, but I'd like to do this using numpy only.

推荐答案

方法1

这里是基于views的一个.利用np.argwhere(文档)返回满足条件的元素的索引,在这种情况下为成员资格. -

Here's one based on views. Makes use of np.argwhere (docs) to return the indices of an element that meet a condition, in this case, membership. -

def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def argwhere_nd(a,b):
    A,B = view1D(a,b)
    return np.argwhere(A[:,None] == B)

方法2

这里是O(n),因此在性能上要好得多,尤其是在大型阵列上-

Here's another that would be O(n) and hence much better on performance, especially on large arrays -

def argwhere_nd_searchsorted(a,b):
    A,B = view1D(a,b)
    sidxB = B.argsort()
    mask = np.isin(A,B)
    cm = A[mask]
    idx0 = np.flatnonzero(mask)
    idx1 = sidxB[np.searchsorted(B,cm, sorter=sidxB)]
    return idx0, idx1 # idx0 : indices in A, idx1 : indices in B

方法3

使用argsort()的另一个O(n)-

def argwhere_nd_argsort(a,b):
    A,B = view1D(a,b)
    c = np.r_[A,B]
    idx = np.argsort(c,kind='mergesort')
    cs = c[idx]
    m0 = cs[:-1] == cs[1:]
    return idx[:-1][m0],idx[1:][m0]-len(A)

示例运行时的输入与以前相同-

Sample runs with same inputs as earlier -

In [650]: argwhere_nd_searchsorted(a,b)
Out[650]: (array([0, 1]), array([2, 0]))

In [651]: argwhere_nd_argsort(a,b)
Out[651]: (array([0, 1]), array([2, 0]))

这篇关于匹配两个2D数组的行并使用numpy获取行索引图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆