Scipy 稀疏累积和 [英] Scipy Sparse Cumsum

查看:58
本文介绍了Scipy 稀疏累积和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 scipy.sparse.csr_matrix 代表下面的值

Suppose I have a scipy.sparse.csr_matrix representing the values below

[[0 0 1 2 0 3 0 4]
 [1 0 0 2 0 3 4 0]]

我想就地计算非零值的累积总和,这会将数组更改为:

I want to calculate the cumulative sum of non-zero values in-place, which would change the array to:

[[0 0 1 3 0 6 0 10]
 [1 0 0 3 0 6 10 0]]

实际值不是 1, 2, 3, ...

The actual values are not 1, 2, 3, ...

每行中非零值的数量不可能相同.

The number of non-zero values in each row are unlikely to be the same.

如何快速做到这一点?

当前程序:

import scipy.sparse
import numpy as np

# sparse data
a = scipy.sparse.csr_matrix(
    [[0,0,1,2,0,3,0,4],
     [1,0,0,2,0,3,4,0]], 
    dtype=int)

# method
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
    st = indptr[i]
    en = indptr[i + 1]
    np.cumsum(data[st:en], out=data[st:en])

# print result
print(a.todense())

结果:

[[ 0  0  1  3  0  6  0 10]
 [ 1  0  0  3  0  6 10  0]]

推荐答案

改用这个怎么样

a = np.array([[0,0,1,2,0,3,0,4],
              [1,0,0,2,0,3,4,0]], dtype=int)

b = a.copy()
b[b > 0] = 1
z = np.cumsum(a,axis=1)
print(z*b)

收益

array([[ 0,  0,  1,  3,  0,  6,  0, 10],
   [ 1,  0,  0,  3,  0,  6, 10,  0]])

做稀疏

def sparse(a):
    a = scipy.sparse.csr_matrix(a)

    indptr = a.indptr
    data = a.data
    for i in range(a.shape[0]):
        st = indptr[i]
        en = indptr[i + 1]
        np.cumsum(data[st:en], out=data[st:en])


In[1]: %timeit sparse(a)
10000 loops, best of 3: 167 µs per loop

使用乘法

def mult(a):
    b = a.copy()
    b[b > 0] = 1
    z = np.cumsum(a, axis=1)
    z * b

In[2]: %timeit mult(a)
100000 loops, best of 3: 5.93 µs per loop

这篇关于Scipy 稀疏累积和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆