如何将scipy.sparse矩阵与广播的密集1d数组进行元素乘积? [英] How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?
问题描述
假设我有一个2d稀疏数组.在我的实际用例中,行数和列数都大得多(例如20000和50000),因此当使用密集表示时,它就无法容纳在内存中:
Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:
>>> import numpy as np
>>> import scipy.sparse as ssp
>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0., 0., 0.],
[ 0., 0., -1.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 2., 0.]])
现在假设我有一个密集的一维数组,其中所有非零分量的大小为3(在我的实际情况下为50000):
Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):
>>> d = np.ones(3) * 3
>>> d
array([ 3., 3., 3.])
我想使用numpy的常规广播语义来计算a和d的元素乘法.但是,scipy中的稀疏矩阵属于np.matrix:'*'运算符被重载以使其表现得像矩阵乘法而不是逐元素乘法:
I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:
>>> a * d
array([ 0., -3., 0., 0., 6.])
一种解决方案是使'a'切换到'*'运算符的数组语义,这将产生预期的结果:
One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:
>>> a.toarray() * d
array([[ 0., 0., 0.],
[ 0., 0., -3.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 6., 0.]])
但是我不能这样做,因为对toarray()的调用会实现不适合内存的'a'的密集版本(结果也将是密集的):
But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):
>>> ssp.issparse(a.toarray())
False
有什么想法如何在仅保留稀疏数据结构且不必对'a'列进行无效python循环的情况下构建它?
Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?
推荐答案
我也在scipy.org进行了回复,但我认为我应该在此处添加答案,以防其他人在搜索时找到此页面.
I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.
您可以将向量转换为稀疏对角矩阵,然后使用矩阵乘法(带有*)来完成与广播相同的操作,但是效率很高.
You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.
>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0., 0., 0.],
[ 0., 0., -3.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 6., 0.]])
希望有帮助!
这篇关于如何将scipy.sparse矩阵与广播的密集1d数组进行元素乘积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!