在两个数组中查找共同值的索引 [英] Find indices of common values in two arrays

查看:107
本文介绍了在两个数组中查找共同值的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.7. 我有两个数组,A和B. 要找到B中存在的A中元素的索引,我可以做

I'm using Python 2.7. I have two arrays, A and B. To find the indices of the elements in A that are present in B, I can do

A_inds = np.in1d(A,B)

我还想获取A中存在的B中元素的索引,即我使用上述代码找到的相同重叠元素中B中的索引.

I also want to get the indices of the elements in B that are present in A, i.e. the indices in B of the same overlapping elements I found using the above code.

目前,我再次在同一行中运行,如下所示:

Currently I am running the same line again as follows:

B_inds = np.in1d(B,A)

但是这种额外的计算似乎应该是不必要的.是否有一种计算效率更高的方式同时获得A_indsB_inds?

but this extra calculation seems like it should be unnecessary. Is there a more computationally efficient way of obtaining both A_inds and B_inds?

我愿意使用列表或数组方法.

I am open to using either list or array methods.

推荐答案

np.searchsorted 可以一起使用来解决它-

np.unique and np.searchsorted could be used together to solve it -

def unq_searchsorted(A,B):

    # Get unique elements of A and B and the indices based on the uniqueness
    unqA,idx1 = np.unique(A,return_inverse=True)
    unqB,idx2 = np.unique(B,return_inverse=True)

    # Create mask equivalent to np.in1d(A,B) and np.in1d(B,A) for unique elements
    mask1 = (np.searchsorted(unqB,unqA,'right') - np.searchsorted(unqB,unqA,'left'))==1
    mask2 = (np.searchsorted(unqA,unqB,'right') - np.searchsorted(unqA,unqB,'left'))==1

    # Map back to all non-unique indices to get equivalent of np.in1d(A,B), 
    # np.in1d(B,A) results for non-unique elements
    return mask1[idx1],mask2[idx2]

运行时测试和验证结果-

Runtime tests and verify results -

In [233]: def org_app(A,B):
     ...:     return np.in1d(A,B), np.in1d(B,A)
     ...: 

In [234]: A = np.random.randint(0,10000,(10000))
     ...: B = np.random.randint(0,10000,(10000))
     ...: 

In [235]: np.allclose(org_app(A,B)[0],unq_searchsorted(A,B)[0])
Out[235]: True

In [236]: np.allclose(org_app(A,B)[1],unq_searchsorted(A,B)[1])
Out[236]: True

In [237]: %timeit org_app(A,B)
100 loops, best of 3: 7.69 ms per loop

In [238]: %timeit unq_searchsorted(A,B)
100 loops, best of 3: 5.56 ms per loop


如果两个输入数组已经是sortedunique,则性能提升将非常可观.因此,解决方案功能将简化为-


If the two input arrays are already sorted and unique, the performance boost would be substantial. Thus, the solution function would simplify to -

def unq_searchsorted_v1(A,B):
    out1 = (np.searchsorted(B,A,'right') - np.searchsorted(B,A,'left'))==1
    out2 = (np.searchsorted(A,B,'right') - np.searchsorted(A,B,'left'))==1  
    return out1,out2

后续运行时测试-

In [275]: A = np.random.randint(0,100000,(20000))
     ...: B = np.random.randint(0,100000,(20000))
     ...: A = np.unique(A)
     ...: B = np.unique(B)
     ...: 

In [276]: np.allclose(org_app(A,B)[0],unq_searchsorted_v1(A,B)[0])
Out[276]: True

In [277]: np.allclose(org_app(A,B)[1],unq_searchsorted_v1(A,B)[1])
Out[277]: True

In [278]: %timeit org_app(A,B)
100 loops, best of 3: 8.83 ms per loop

In [279]: %timeit unq_searchsorted_v1(A,B)
100 loops, best of 3: 4.94 ms per loop

这篇关于在两个数组中查找共同值的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆