稀疏矩阵中前k个元素的值增加 [英] Increasing value of top k elements in sparse matrix

查看：71 发布时间：2020/5/18 23:10:55 python sorting numpy scipy sparse-matrix

本文介绍了稀疏矩阵中前k个元素的值增加的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试找到一种有效的方法，使我可以将稀疏矩阵的前k个值增加一些常数.我目前正在使用以下代码，对于非常大的矩阵，这是非常慢的:

I am trying to find an efficient way that lets me increase the top k values of a sparse matrix by some constant value. I am currently using the following code, which is quite slow for very large matrices:

a = csr_matrix((2,2)) #just some sample data
a[1,1] = 3.
a[0,1] = 2.

y = a.tocoo()
idx = y.data.argsort()[::-1][:1] #k is 1
for i, j in izip(y.row[idx], y.col[idx]):
    a[i,j] += 1

实际上，排序似乎很快，问题出在我的最终循环中，在该循环中，我通过已排序的索引进行索引来增加值.希望有人对如何加快速度有所了解.

Actually the sorting seems to be fast, the problem lies with my final loop where I increase the values by indexing via the sorted indices. Hope someone has an idea of how to speed this up.

推荐答案

通过直接修改a.data而不是遍历行/列索引并修改单个元素，您可能会大大加快速度:

You could probably speed things up quite a lot by directly modifying a.data rather than iterating over row/column indices and modifying individual elements:

idx = a.data.argsort()[::-1][:1] #k is 1
a.data[idx] += 1

这也节省了从CSR-> COO转换的时间.

This also saves converting from CSR --> COO.

正如@WarrenWeckesser正确指出的那样，由于您仅对k最大元素的索引感兴趣，并且您并不关心它们的顺序，因此可以使用argpartition而不是argsort.当a.data很大时，这可以快很多.

As @WarrenWeckesser rightly points out, since you're only interested in the indices of the k largest elements and you don't care about their order, you can use argpartition rather than argsort. This can be quite a lot faster when a.data is large.

例如:

from scipy import sparse

# a random sparse array with 1 million non-zero elements
a = sparse.rand(10000, 10000, density=0.01, format='csr')

# find the indices of the 100 largest non-zero elements
k = 100

# using argsort:
%timeit a.data.argsort()[-k:]
# 10 loops, best of 3: 135 ms per loop

# using argpartition:
%timeit a.data.argpartition(-k)[-k:]
# 100 loops, best of 3: 13 ms per loop

# test correctness:
np.all(a.data[a.data.argsort()[-k:]] == 
       np.sort(a.data[a.data.argpartition(-k)[-k:]]))
# True

这篇关于稀疏矩阵中前k个元素的值增加的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

稀疏矩阵中前k个元素的值增加 [英] Increasing value of top k elements in sparse matrix

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

稀疏矩阵中前k个元素的值增加 [英] Increasing value of top k elements in sparse matrix

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭