如何有效地获取具有唯一值的索引列表? [英] How to get lists of indices to unique values efficiently?

查看：107 发布时间：2020/5/18 21:47:42 python numpy pandas

本文介绍了如何有效地获取具有唯一值的索引列表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有一个内置方法可以帮助我有效地实现以下目标:给定一个数组，我需要一个数组列表，每个数组的索引都指向该数组的不同唯一值?

Is there a built-in method that would help me achieve the following efficiently: given an array, I need a list of arrays, each with indices to a different unique value of the array?

如果f是所需功能，

b = f(a)

和

u, idxs = unique(a)

然后

b[i] == where(idxs==i)[0]

我知道pandas.Series.groupby()可以做到这一点，但是当存在超过10 ^ 5个唯一整数时，创建字典可能并不高效.

I am aware that pandas.Series.groupby() can do this, but it may no be efficient to create a dict when there are over 10^5 unique integers.

推荐答案

如果您的numpy> = 1.9，则可以执行以下操作:

If you have numpy >= 1.9 you can do:

>>> a = np.random.randint(5, size=10)
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> unq, unq_inv, unq_cnt = np.unique(a, return_inverse=True, return_counts=True)
>>> np.split(np.argsort(unq_inv), np.cumsum(unq_cnt[:-1]))
[array([0]), array([9]), array([1, 4, 8]), array([7]), array([2, 3, 5, 6])]
>>> unq
array([0, 1, 2, 3, 4])

在早期版本中，您可以额外进行计数:

In earlier versions, you can get the counts doing an extra:

>>> unq_cnt = np.bincount(unq_inv)

此外，如果您想确保每个值的索引都已排序，我想您将需要使用稳定的排序方式，例如np.argsort(unq_inv, kind='mergesort')

Also, if you want to make sure that the indices for each value are sorted, I think you will need to use a stable sort, e.g. np.argsort(unq_inv, kind='mergesort')

考虑到您想得到的东西(我认为这是在最大限度减少对昂贵函数的调用)，我认为您不需要执行所要执行的操作.假设您的函数是平方，则只需执行以下操作即可:

Thinking about what you seem to be after, which I think is minimizing calls to an expensive function, I don't think you need to do what you are asking. Say that your function was squaring, you could simply do:

>>> unq, unq_inv = np.unique(a, return_inverse=True)
>>> f_unq = unq**2
>>> f_a = f_unq[unq_inv]
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> f_a
array([ 0,  4, 16, 16,  4, 16, 16,  9,  4,  1])

这篇关于如何有效地获取具有唯一值的索引列表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何有效地获取具有唯一值的索引列表? [英] How to get lists of indices to unique values efficiently?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何有效地获取具有唯一值的索引列表? [英] How to get lists of indices to unique values efficiently?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭