numpy获取二维数组中重复元素的确切参数 [英] Numpy to get the exact arguments of duplicated elements in a 2D array

查看:295
本文介绍了numpy获取二维数组中重复元素的确切参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个2D数组 a b 。我想在 b 中找到 a 精确索引。我按照建议的解决方案这里

I have two 2D arrays a and b. I want to find the exact indices of a in b. I followed the solution proposed here.

问题是我的数组包含重复项,如您在此处看到的那样:

The problem is that my arrays contain duplicates as you can see here:

# The shape of b is (50, 2)
b = np.array([[ 0,  1],[ 2,  3],[ 4,  5],[ 6,  7], [ 0,  1],
             [10, 11], [12, 13], [14, 15], [16, 17], [10, 11],
             [20, 21], [22, 23], [24, 25], [26, 27], [20, 21],
             [30, 31], [32, 33], [34, 35], [36, 37], [30, 31],
             [40, 41], [42, 43], [44, 45], [46, 47], [40, 41],
             [50, 51], [52, 53], [54, 55], [56, 57], [50, 51],
             [60, 61], [62, 63], [64, 65], [66, 67], [60, 61],
             [70, 71], [72, 73], [74, 75], [76, 77], [70, 71],
             [80, 81], [82, 83], [84, 85], [86, 87], [80, 81],
             [90, 91], [92, 93], [94, 95], [96, 97], [90, 91]])

# The shape of a is (20,2)
a = np.array([[ 0,  1],[ 2,  3], [ 4,  5],[ 6,  7],[ 0,  1],
       [50, 51],[52, 53], [54, 55], [56, 57], [50, 51],
       [20, 21], [22, 23], [24, 25], [26, 27], [20, 21],
       [70, 71], [72, 73], [74, 75], [76, 77], [70, 71]])

现在,当我尝试这样的操作时:

Now when I try something like this:

# See the link above approach 2
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def argwhere_nd_searchsorted(a,b):
    A,B = view1D(a,b)
    sidxB = B.argsort()
    mask = np.isin(A,B)
    cm = A[mask]
    idx0 = np.flatnonzero(mask)
    idx1 = sidxB[np.searchsorted(B,cm, sorter=sidxB)]
    return idx0, idx1 # idx0 : indices in A, idx1 : indices in B

args0, args1 = argwhere_nd_searchsorted(a,b)

结果:

#args0
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19])

#args1
 array([ 0,
  1,
  2,
  3,
  0, # this sould be 4
 25,
 26,
 27,
 28,
 25, # this sould be 29
 10,
 11,
 12,
 13,
 10,# this should be 14
 39,# this should be 35
 36,
 37,
 38,
 39])
# if we check
np.equal(b[args1],a).all() # This returns True

如您所见, args1 中突出显示的索引重复出现。我的预期结果显示在注释行中。

As you can see, the problem in args1 the highlighted indices are repeated. My expected result is shown in the commented lines.

感谢任何帮助

推荐答案

我们可以再添加一列ID来代表行中的重复项,然后使用相同的步骤。我们将使用熊猫来获取这些ID,那样就更容易了。因此,只需-

We could add one more column of IDs to represent duplicates within the rows and then use the same steps. We will use pandas to get those IDs, it's just easier that way. Hence, simply do -

import pandas as pd

def assign_duplbl(a):
    df = pd.DataFrame(a)
    df['num'] = 1
    return df.groupby(list(range(a.shape[1]))).cumsum().values

a1 = np.hstack((a,assign_duplbl(a)))
b1 = np.hstack((b,assign_duplbl(b)))
args0, args1 = argwhere_nd_searchsorted(a1,b1)

这篇关于numpy获取二维数组中重复元素的确切参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆