如何有效地获取具有唯一值的索引列表? [英] How to get lists of indices to unique values efficiently?

查看:107
本文介绍了如何有效地获取具有唯一值的索引列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一个内置方法可以帮助我有效地实现以下目标:给定一个数组,我需要一个数组列表,每个数组的索引都指向该数组的不同唯一值?

Is there a built-in method that would help me achieve the following efficiently: given an array, I need a list of arrays, each with indices to a different unique value of the array?

如果f是所需功能,

b = f(a)

u, idxs = unique(a)

然后

b[i] == where(idxs==i)[0]

我知道pandas.Series.groupby()可以做到这一点,但是当存在超过10 ^ 5个唯一整数时,创建字典可能并不高效.

I am aware that pandas.Series.groupby() can do this, but it may no be efficient to create a dict when there are over 10^5 unique integers.

推荐答案

如果您的numpy> = 1.9,则可以执行以下操作:

If you have numpy >= 1.9 you can do:

>>> a = np.random.randint(5, size=10)
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> unq, unq_inv, unq_cnt = np.unique(a, return_inverse=True, return_counts=True)
>>> np.split(np.argsort(unq_inv), np.cumsum(unq_cnt[:-1]))
[array([0]), array([9]), array([1, 4, 8]), array([7]), array([2, 3, 5, 6])]
>>> unq
array([0, 1, 2, 3, 4])

在早期版本中,您可以额外进行计数:

In earlier versions, you can get the counts doing an extra:

>>> unq_cnt = np.bincount(unq_inv)

此外,如果您想确保每个值的索引都已排序,我想您将需要使用稳定的排序方式,例如np.argsort(unq_inv, kind='mergesort')

Also, if you want to make sure that the indices for each value are sorted, I think you will need to use a stable sort, e.g. np.argsort(unq_inv, kind='mergesort')

考虑到您想得到的东西(我认为这是在最大限度减少对昂贵函数的调用),我认为您不需要执行所要执行的操作.假设您的函数是平方,则只需执行以下操作即可:

Thinking about what you seem to be after, which I think is minimizing calls to an expensive function, I don't think you need to do what you are asking. Say that your function was squaring, you could simply do:

>>> unq, unq_inv = np.unique(a, return_inverse=True)
>>> f_unq = unq**2
>>> f_a = f_unq[unq_inv]
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> f_a
array([ 0,  4, 16, 16,  4, 16, 16,  9,  4,  1])

这篇关于如何有效地获取具有唯一值的索引列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆