数组之间的交集指标 [英] Indices of intersection between arrays

查看:134
本文介绍了数组之间的交集指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种快速的方法可以将数组的每个元素与唯一标识符列表中的每个元素进行比较?

Is there a fast way to compare every element of an array against every element in a list of unique identifiers?

使用for循环遍历每个唯一值都可以,但是太慢而无法使用.我一直在寻找矢量化解决方案,但没有成功.任何帮助将不胜感激!

Using a for loop to loop through each of the unique values works but is way too slow to be usable. I have been searching for a vectorized solution but have not been successful. Any help would be greatly appreciated!

arrStart = []
startRavel = startInforce['pol_id'].ravel()
for policy in unique_policies:
    arrStart.append(np.argwhere(startRavel == policy))

样本输入:

startRavel = [1,2,2,2,3,3]

unique_policies = [1,2,3]

示例输出:

arrStart = [[0], [1,2,3],[4,5]]

新数组的长度与唯一值数组的长度相同,但是每个元素都是与大数组中的唯一值匹配的所有行的列表.

The new array would have the same length as the unique values array but each element would be a list of all of the rows that match that unique value in the large array.

推荐答案

以下是矢量化解决方案:

Here's a vectorized solution:

import numpy as np
startRavel = np.array([1,2,2,2,3,3])
unique_policies = np.array([1,2,3])

使用 np.argsort排序startRavel > .

Sort startRavel using np.argsort.

ix = np.argsort(startRavel)
s_startRavel = startRavel[ix]

使用 np.searchsorted 以找到应在startRavel中插入unique_policies的索引以保持顺序:

Use np.searchsorted to find the indices in which unique_policies should be inserted in startRavel to mantain order:

s_ix = np.searchsorted(s_startRavel, unique_policies)
# array([0, 1, 4])

然后使用 np.split 使用获得的索引拆分数组.再次使用 np.argsort s_ix上处理未排序的输入:

And then use np.split to split the array using the obtained indices. np.argsort is used again on s_ix to deal with non-sorted inputs:

ix_r = np.argsort(s_ix)
ixs = np.split(ix, s_ix[ix_r][1:])
np.array(ixs)[ix_r]
# [array([0]), array([1, 2, 3]), array([4, 5])]


通用解决方案:

让我们将其全部包装在一个函数中:

Lets wrap it all up in a function:

def ix_intersection(x, y):
    """
    Finds the indices where each unique
    value in x is found in y.
    Both x and y must be numpy arrays.
    ----------
    x: np.array
       Must contain unique values. 
       Values in x are assumed to be in y.
    y: np.array

    Returns
    -------
    Array of arrays. Each array contains the indices where a
    value in x is found in y
    """
    ix_y = np.argsort(y)
    s = np.searchsorted(y[ix_y], x)
    ix_r = np.argsort(s)
    ixs = np.split(ix_y, s[ix_r][1:])
    return np.array(ixs)[ix_r]


其他示例

让我们尝试以下数组:

startRavel = np.array([1,3,3,2,2,2])
unique_policies = np.array([1,2,3])

ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([1, 2])])

另一个例子,这次输入是未排序的:

Another example, this time with non-sorted inputs:

startRavel = np.array([1,3,3,2,2,2,5])
unique_policies = np.array([1,2,5,3])

ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([6]), array([1, 2])])

这篇关于数组之间的交集指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆