numpy获取二维数组中重复元素的确切参数 [英] Numpy to get the exact arguments of duplicated elements in a 2D array
问题描述
我有两个2D数组 a
和 b
。我想在 b
中找到 a
的精确索引。我按照建议的解决方案这里。
I have two 2D arrays a
and b
. I want to find the exact indices of a
in b
. I followed the solution proposed here.
问题是我的数组包含重复项,如您在此处看到的那样:
The problem is that my arrays contain duplicates as you can see here:
# The shape of b is (50, 2)
b = np.array([[ 0, 1],[ 2, 3],[ 4, 5],[ 6, 7], [ 0, 1],
[10, 11], [12, 13], [14, 15], [16, 17], [10, 11],
[20, 21], [22, 23], [24, 25], [26, 27], [20, 21],
[30, 31], [32, 33], [34, 35], [36, 37], [30, 31],
[40, 41], [42, 43], [44, 45], [46, 47], [40, 41],
[50, 51], [52, 53], [54, 55], [56, 57], [50, 51],
[60, 61], [62, 63], [64, 65], [66, 67], [60, 61],
[70, 71], [72, 73], [74, 75], [76, 77], [70, 71],
[80, 81], [82, 83], [84, 85], [86, 87], [80, 81],
[90, 91], [92, 93], [94, 95], [96, 97], [90, 91]])
# The shape of a is (20,2)
a = np.array([[ 0, 1],[ 2, 3], [ 4, 5],[ 6, 7],[ 0, 1],
[50, 51],[52, 53], [54, 55], [56, 57], [50, 51],
[20, 21], [22, 23], [24, 25], [26, 27], [20, 21],
[70, 71], [72, 73], [74, 75], [76, 77], [70, 71]])
现在,当我尝试这样的操作时:
Now when I try something like this:
# See the link above approach 2
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def argwhere_nd_searchsorted(a,b):
A,B = view1D(a,b)
sidxB = B.argsort()
mask = np.isin(A,B)
cm = A[mask]
idx0 = np.flatnonzero(mask)
idx1 = sidxB[np.searchsorted(B,cm, sorter=sidxB)]
return idx0, idx1 # idx0 : indices in A, idx1 : indices in B
args0, args1 = argwhere_nd_searchsorted(a,b)
结果:
#args0
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19])
#args1
array([ 0,
1,
2,
3,
0, # this sould be 4
25,
26,
27,
28,
25, # this sould be 29
10,
11,
12,
13,
10,# this should be 14
39,# this should be 35
36,
37,
38,
39])
# if we check
np.equal(b[args1],a).all() # This returns True
如您所见, args1
中突出显示的索引重复出现。我的预期结果显示在注释行中。
As you can see, the problem in args1
the highlighted indices are repeated. My expected result is shown in the commented lines.
感谢任何帮助
推荐答案
我们可以再添加一列ID来代表行中的重复项,然后使用相同的步骤。我们将使用熊猫来获取这些ID,那样就更容易了。因此,只需-
We could add one more column of IDs to represent duplicates within the rows and then use the same steps. We will use pandas to get those IDs, it's just easier that way. Hence, simply do -
import pandas as pd
def assign_duplbl(a):
df = pd.DataFrame(a)
df['num'] = 1
return df.groupby(list(range(a.shape[1]))).cumsum().values
a1 = np.hstack((a,assign_duplbl(a)))
b1 = np.hstack((b,assign_duplbl(b)))
args0, args1 = argwhere_nd_searchsorted(a1,b1)
这篇关于numpy获取二维数组中重复元素的确切参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!