数组之间的交集指标 [英] Indices of intersection between arrays
问题描述
有没有一种快速的方法可以将数组的每个元素与唯一标识符列表中的每个元素进行比较?
Is there a fast way to compare every element of an array against every element in a list of unique identifiers?
使用for循环遍历每个唯一值都可以,但是太慢而无法使用.我一直在寻找矢量化解决方案,但没有成功.任何帮助将不胜感激!
Using a for loop to loop through each of the unique values works but is way too slow to be usable. I have been searching for a vectorized solution but have not been successful. Any help would be greatly appreciated!
arrStart = []
startRavel = startInforce['pol_id'].ravel()
for policy in unique_policies:
arrStart.append(np.argwhere(startRavel == policy))
样本输入:
startRavel = [1,2,2,2,3,3]
unique_policies = [1,2,3]
示例输出:
arrStart = [[0], [1,2,3],[4,5]]
新数组的长度与唯一值数组的长度相同,但是每个元素都是与大数组中的唯一值匹配的所有行的列表.
The new array would have the same length as the unique values array but each element would be a list of all of the rows that match that unique value in the large array.
推荐答案
以下是矢量化解决方案:
Here's a vectorized solution:
import numpy as np
startRavel = np.array([1,2,2,2,3,3])
unique_policies = np.array([1,2,3])
Sort startRavel
using np.argsort
.
ix = np.argsort(startRavel)
s_startRavel = startRavel[ix]
使用 np.searchsorted
以找到应在startRavel
中插入unique_policies
的索引以保持顺序:
Use np.searchsorted
to find the indices in which unique_policies
should be inserted in startRavel
to mantain order:
s_ix = np.searchsorted(s_startRavel, unique_policies)
# array([0, 1, 4])
然后使用 np.split
使用获得的索引拆分数组.再次使用 np.argsort
在s_ix
上处理未排序的输入:
And then use np.split
to split the array using the obtained indices. np.argsort
is used again on s_ix
to deal with non-sorted inputs:
ix_r = np.argsort(s_ix)
ixs = np.split(ix, s_ix[ix_r][1:])
np.array(ixs)[ix_r]
# [array([0]), array([1, 2, 3]), array([4, 5])]
通用解决方案:
让我们将其全部包装在一个函数中:
Lets wrap it all up in a function:
def ix_intersection(x, y):
"""
Finds the indices where each unique
value in x is found in y.
Both x and y must be numpy arrays.
----------
x: np.array
Must contain unique values.
Values in x are assumed to be in y.
y: np.array
Returns
-------
Array of arrays. Each array contains the indices where a
value in x is found in y
"""
ix_y = np.argsort(y)
s = np.searchsorted(y[ix_y], x)
ix_r = np.argsort(s)
ixs = np.split(ix_y, s[ix_r][1:])
return np.array(ixs)[ix_r]
其他示例
让我们尝试以下数组:
startRavel = np.array([1,3,3,2,2,2])
unique_policies = np.array([1,2,3])
ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([1, 2])])
另一个例子,这次输入是未排序的:
Another example, this time with non-sorted inputs:
startRavel = np.array([1,3,3,2,2,2,5])
unique_policies = np.array([1,2,5,3])
ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([6]), array([1, 2])])
这篇关于数组之间的交集指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!