Python中2种稀疏矩阵的一种特殊的逐行乘法 [英] Special kind of row-by-row multiplication of 2 sparse matrices in Python
问题描述
我正在寻找的是:一种在Python中实现恰好是scipy稀疏的矩阵的特殊乘法运算的方法( Kronecker乘法或逐点乘法,并且scipy.sparse中似乎没有任何内置支持.
What I'm looking for: a way to implement in Python a special multiplication operation for matrices that happen to be in scipy sparse (csr) format. This is a special kind of multiplication, not matrix multiplication nor Kronecker multiplication nor Hadamard aka pointwise multiplication, and does not seem to have any built-in support in scipy.sparse.
所需的操作:输出的每一行应包含两个输入矩阵中相应行的元素的每个乘积的结果.因此,从两个大小相同的矩阵开始,每个矩阵的维度 m 为 n ,结果的维度为 m 为 n ^ 2 .
The desired operation: Each row of the output should contain the results of every product of the elements of the corresponding rows in the two input matrices. So starting with two identically sized matrices, each with dimensions m by n, the result should have dimensions m by n^2.
它看起来像这样:
Python代码:
import scipy.sparse
A = scipy.sparse.csr_matrix(np.array([[1,2],[3,4]]))
B = scipy.sparse.csr_matrix(np.array([[0,5],[6,7]]))
# C is the desired product of A and B. It should look like:
C = scipy.sparse.csr_matrix(np.array([[0,5,0,10],[18,21,24,28]]))
执行此操作的一种好方法或有效方法是什么?我尝试过在stackoverflow以及其他地方进行此操作,到目前为止还算不上运气.到目前为止,听起来我最好的选择是在for循环中逐行进行操作,但是由于我的输入矩阵有几百万行和几千列,而大多数都是稀疏的,这听起来太可怕了.
What would be a nice or efficient way to do this? I've tried looking here on stackoverflow as well as elsewhere, with no luck so far. So far it sounds like my best bet is to do a row by row operation in a for loop, but that sounds horrendous seeing as my input matrices have a few million rows and few thousand columns, mostly sparse.
推荐答案
在您的示例中,C
是kron
In [4]: A=np.array([[1,2],[3,4]])
In [5]: B=np.array([[0,5],[6,7]])
In [6]: np.kron(A,B)
Out[6]:
array([[ 0, 5, 0, 10],
[ 6, 7, 12, 14],
[ 0, 15, 0, 20],
[18, 21, 24, 28]])
In [7]: np.kron(A,B)[[0,3],:]
Out[7]:
array([[ 0, 5, 0, 10],
[18, 21, 24, 28]])
kron
包含与np.outer
相同的值,但它们的顺序不同.
kron
contains the same values as np.outer
, but they are in a different order.
对于大型密集阵列,einsum
可能会提供良好的速度:
For large dense arrays, einsum
might provide good speed:
np.einsum('ij,ik->ijk',A,B).reshape(A.shape[0],-1)
sparse.kron
与np.kron
的作用相同:
As = sparse.csr_matrix(A); Bs ...
sparse.kron(As,Bs).tocsr()[[0,3],:].A
sparse.kron
是用Python编写的,因此,如果执行不必要的计算,您可能会对其进行修改.
sparse.kron
is written in Python, so you probably could modify it if it is doing unnecessary calculations.
一个迭代解决方案似乎是:
An iterative solution appears to be:
sparse.vstack([sparse.kron(a,b) for a,b in zip(As,Bs)]).A
要迭代,我不希望它比缩减完整的kron
更快.但是,如果不深入研究sparse.kron
的逻辑,那可能是我所能做的最好的事情.
Being iterative I don't expect it to be faster than paring down the full kron
. But short of digging into the logic of sparse.kron
it is probably the best I can do.
vstack
使用bmat
,因此计算公式为:
vstack
uses bmat
, so the calculation is:
sparse.bmat([[sparse.kron(a,b)] for a,b in zip(As,Bs)])
但是bmat
相当复杂,因此进一步简化它并不容易.
But bmat
is rather complex, so it won't be easy to simplify this further.
np.einsum
解决方案无法轻松扩展为稀疏-没有sparse.einsum
,中间产品是3d,稀疏无法处理.
The np.einsum
solution can't be easily extended to sparse - there isn't a sparse.einsum
, and the intermediate product is 3d, which sparse does not handle.
sparse.kron
使用coo
格式,这不适用于处理行.但是本着该函数的精神,我设计了一个对csr
格式矩阵的行进行迭代的函数.像kron
和bmat
一样,我正在构造data
,row
,col
数组,并从这些数组构造一个coo_matrix
.进而可以将其转换为其他格式.
sparse.kron
uses coo
format, which is no good for working with the rows. But working in the spirit of that function, I've worked out a function that iterates on the rows of csr
format matrices. Like kron
and bmat
I'm constructing the data
, row
, col
arrays, and constructing a coo_matrix
from those. That in turn can be converted to other formats.
def test_iter(A, B):
m,n1 = A.shape
n2 = B.shape[1]
Cshape = (m, n1*n2)
data = np.empty((m,),dtype=object)
col = np.empty((m,),dtype=object)
row = np.empty((m,),dtype=object)
for i,(a,b) in enumerate(zip(A, B)):
data[i] = np.outer(a.data, b.data).flatten()
#col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
col1 = a.indices * n2 # correction
col[i] = (col1[:,None]+b.indices).flatten()
row[i] = np.full((a.nnz*b.nnz,), i)
data = np.concatenate(data)
col = np.concatenate(col)
row = np.concatenate(row)
return sparse.coo_matrix((data,(row,col)),shape=Cshape)
使用这些较小的2x2矩阵和较大的矩阵(例如A1=sparse.rand(1000,2000).tocsr()
),这比使用bmat
的版本快约3倍.对于足够大的矩阵,它比密集的einsum
版本(它可能存在内存错误)要好.
With these small 2x2 matrices, as well as larger ones (e.g. A1=sparse.rand(1000,2000).tocsr()
), this is about 3x faster than the version using bmat
. For large enough matrices it is better than dense einsum
version (which can have memory errors).
这篇关于Python中2种稀疏矩阵的一种特殊的逐行乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!