在 scipy 稀疏矩阵的一行中查找前 n 个值 [英] Finding the top n values in a row of a scipy sparse matrix

查看:51
本文介绍了在 scipy 稀疏矩阵的一行中查找前 n 个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 CSR 格式的 scipy 稀疏矩阵.它是 72665x72665,因此将此矩阵转换为密集矩阵以对其执行操作是不切实际的(此矩阵的密集表示类似于 40 gig).该矩阵是对称的,并且有大约 8200 万个非零条目 (~1.5%).

I have a scipy sparse matrix in CSR format. It's 72665x72665 so it's impractical to convert this matrix to a dense matrix to perform operations on (the dense representation of this matrix is like 40 gigs). The matrix is symmetric, and has about 82 million non-zero entries (~1.5%).

我希望能够做的是,对于每一行,我想获得最大 N 值的索引.如果这是一个 numpy 数组,我会使用 np.argpartition 这样做:

What I would like to be able to do is, for each row, I want to get the indices of the largest N values. If this were a numpy array, I would use np.argpartition to do it like so:

    for row in matrix:
        top_n_idx = np.argpartition(row,-n)[-n:]

对于稀疏矩阵,我可以做类似的事情吗?

Is there something similar to this I can do for a sparse matrix?

推荐答案

改进@Paul Panzer 的解决方案.现在它可以处理任何行的值小于 n 的情况.

Improve from @Paul Panzer's solution. Now it can handle the case when any row has less than n values.

def top_n_idx_sparse(matrix, n):
    '''Return index of top n values in each row of a sparse matrix'''
    top_n_idx = []
    for le, ri in zip(matrix.indptr[:-1], matrix.indptr[1:]):
        n_row_pick = min(n, ri - le)
        top_n_idx.append(matrix.indices[le + np.argpartition(matrix.data[le:ri], -n_row_pick)[-n_row_pick:]])
    return top_n_idx

这篇关于在 scipy 稀疏矩阵的一行中查找前 n 个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆