有没有更简单，更快速的方法来获取索引dict，其中包含列表或numpy数组中相同元素的索引 [英] Is there a simpler and faster way to get an indexes dict in which contains the indexes of the same elements in a list or a numpy array

查看：58 发布时间：2020/5/18 22:12:34 python arrays numpy indexing

本文介绍了有没有更简单，更快速的方法来获取索引dict，其中包含列表或numpy数组中相同元素的索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说明:

我有一个包含简单整数(正数和不大数)的大型数组，例如1，2，...等.例如:[1、2、1、2、1、2].我想要一个字典，其中使用列表中的单个值作为字典的键，并使用此值的索引列表作为字典的值.

I have a large array with simple integers(positive and not large) like 1, 2, ..., etc. For example: [1, 1, 2, 2, 1, 2]. I want to get a dict in which use a single value from the list as the dict's key, and use the indexes list of this value as the dict's value.

问题:

是否有更简单，更快速的方法来在python中获得预期的结果? (数组可以是列表或numpy数组)

Is there a simpler and faster way to get the expected results in python? (array can be a list or a numpy array)

代码:

a = [1, 1, 2, 2, 1, 2]
results = indexes_of_same_elements(a)
print(results)

预期结果:

{1:[0, 1, 4], 2:[2, 3, 5]}

推荐答案

我们可以利用以下事实:元素是简单的"(即非负且不是太大?)整数.

We can exploit the fact that the elements are "simple" (i.e. nonnegative and not too large?) integers.

诀窍是构造一个稀疏矩阵，每行仅包含一个元素，然后将其转换为按列表示.这通常比argsort快，因为如果稀疏矩阵是nx非零的MxN，则此变换为O(M + N + nnz).

The trick is to construct a sparse matrix with just one element per row and then to transform it to a column wise representation. This is typically faster than argsort because this transform is O(M + N + nnz), if the sparse matrix is MxN with nnz nonzeros.

from scipy import sparse

def use_sprsm():
    x = sparse.csr_matrix((a, a, np.arange(a.size+1))).tocsc()
    idx, = np.where(x.indptr[:-1] != x.indptr[1:])
    return {i: a for i, a in zip(idx, np.split(x.indices, x.indptr[idx[1:]]))}

# for comparison

def use_asort():
    idx = np.argsort(a)
    el, c = np.unique(a, return_counts=True)
    return dict(zip(el, np.split(idx, c.cumsum()[:-1])))

样品运行:

>>> a = np.random.randint(0, 100, (10_000,))
>>> 
# sanity check, note that `use_sprsm` returns sorted indices
>>> for k, v in use_asort().items():
...     assert np.array_equal(np.sort(v), use_sprsm()[k])
... 
>>> timeit(use_asort, number=1000)
0.8930604780325666
>>> timeit(use_sprsm, number=1000)
0.38419671391602606

这篇关于有没有更简单，更快速的方法来获取索引dict，其中包含列表或numpy数组中相同元素的索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有没有更简单，更快速的方法来获取索引dict，其中包含列表或numpy数组中相同元素的索引 [英] Is there a simpler and faster way to get an indexes dict in which contains the indexes of the same elements in a list or a numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有没有更简单，更快速的方法来获取索引dict，其中包含列表或numpy数组中相同元素的索引 [英] Is there a simpler and faster way to get an indexes dict in which contains the indexes of the same elements in a list or a numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭