沿轴0重复Scipy CSR稀疏矩阵 [英] Repeat a scipy csr sparse matrix along axis 0

查看:82
本文介绍了沿轴0重复Scipy CSR稀疏矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想重复一个稀疏的csr稀疏矩阵的行,但是当我尝试调用numpy的repeat方法时,它只是将稀疏矩阵像一个对象一样对待,并且只会将其作为ndarray中的一个对象重复.我仔细阅读了文档,但找不到任何实用程序来重复scipy csr稀疏矩阵的行.

I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an ndarray. I looked through the documentation, but I couldn't find any utility to repeats the rows of a scipy csr sparse matrix.

我编写了以下对内部数据进行操作的代码,

I wrote the following code that operates on the internal data, which seems to work

def csr_repeat(csr, repeats):
    if isinstance(repeats, int):
        repeats = np.repeat(repeats, csr.shape[0])
    repeats = np.asarray(repeats)
    rnnz = np.diff(csr.indptr)
    ndata = rnnz.dot(repeats)
    if ndata == 0:
        return sparse.csr_matrix((np.sum(repeats), csr.shape[1]),
                                 dtype=csr.dtype)
    indmap = np.ones(ndata, dtype=np.int)
    indmap[0] = 0
    rnnz_ = np.repeat(rnnz, repeats)
    indptr_ = rnnz_.cumsum()
    mask = indptr_ < ndata
    indmap -= np.int_(np.bincount(indptr_[mask],
                                  weights=rnnz_[mask],
                                  minlength=ndata))
    jumps = (rnnz * repeats).cumsum()
    mask = jumps < ndata
    indmap += np.int_(np.bincount(jumps[mask],
                                  weights=rnnz[mask],
                                  minlength=ndata))
    indmap = indmap.cumsum()
    return sparse.csr_matrix((csr.data[indmap],
                              csr.indices[indmap],
                              np.r_[0, indptr_]),
                             shape=(np.sum(repeats), csr.shape[1]))

并保持合理的效率,但我宁可不要猴子给全班补习.有更好的方法吗?

and be reasonably efficient, but I'd rather not monkey patch the class. Is there a better way to do this?

当我重新审视这个问题时,我想知道为什么我首先发布它.我想做的与重复矩阵几乎所有的事情都将与原始矩阵更容易完成,然后再应用重复.我的假设是,与任何可能的答案相比,重复帖子始终是解决此问题的更好方法.

As I revisit this question, I wonder why I posted it in the first place. Almost everything I could think to do with the repeated matrix would be easier to do with the original matrix, and then apply the repetition afterwards. My assumption is that post repetition will always be the better way to approach this problem than any of the potential answers.

推荐答案

from scipy.sparse import csr_matrix
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) * sparse_row

这篇关于沿轴0重复Scipy CSR稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆