如何有效地获取具有唯一值的索引列表? [英] How to get lists of indices to unique values efficiently?
问题描述
是否有一个内置方法可以帮助我有效地实现以下目标:给定一个数组,我需要一个数组列表,每个数组的索引都指向该数组的不同唯一值?
Is there a built-in method that would help me achieve the following efficiently: given an array, I need a list of arrays, each with indices to a different unique value of the array?
如果f
是所需功能,
b = f(a)
和
u, idxs = unique(a)
然后
b[i] == where(idxs==i)[0]
我知道pandas.Series.groupby()
可以做到这一点,但是当存在超过10 ^ 5个唯一整数时,创建字典可能并不高效.
I am aware that pandas.Series.groupby()
can do this, but it may no be efficient to create a dict when there are over 10^5 unique integers.
推荐答案
如果您的numpy> = 1.9,则可以执行以下操作:
If you have numpy >= 1.9 you can do:
>>> a = np.random.randint(5, size=10)
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> unq, unq_inv, unq_cnt = np.unique(a, return_inverse=True, return_counts=True)
>>> np.split(np.argsort(unq_inv), np.cumsum(unq_cnt[:-1]))
[array([0]), array([9]), array([1, 4, 8]), array([7]), array([2, 3, 5, 6])]
>>> unq
array([0, 1, 2, 3, 4])
在早期版本中,您可以额外进行计数:
In earlier versions, you can get the counts doing an extra:
>>> unq_cnt = np.bincount(unq_inv)
此外,如果您想确保每个值的索引都已排序,我想您将需要使用稳定的排序方式,例如np.argsort(unq_inv, kind='mergesort')
Also, if you want to make sure that the indices for each value are sorted, I think you will need to use a stable sort, e.g. np.argsort(unq_inv, kind='mergesort')
考虑到您想得到的东西(我认为这是在最大限度减少对昂贵函数的调用),我认为您不需要执行所要执行的操作.假设您的函数是平方,则只需执行以下操作即可:
Thinking about what you seem to be after, which I think is minimizing calls to an expensive function, I don't think you need to do what you are asking. Say that your function was squaring, you could simply do:
>>> unq, unq_inv = np.unique(a, return_inverse=True)
>>> f_unq = unq**2
>>> f_a = f_unq[unq_inv]
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> f_a
array([ 0, 4, 16, 16, 4, 16, 16, 9, 4, 1])
这篇关于如何有效地获取具有唯一值的索引列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!