Python中2种稀疏矩阵的一种特殊的逐行乘法 [英] Special kind of row-by-row multiplication of 2 sparse matrices in Python

查看:356
本文介绍了Python中2种稀疏矩阵的一种特殊的逐行乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找的是:一种在Python中实现恰好是scipy稀疏的矩阵的特殊乘法运算的方法( Kronecker乘法

What I'm looking for: a way to implement in Python a special multiplication operation for matrices that happen to be in scipy sparse (csr) format. This is a special kind of multiplication, not matrix multiplication nor Kronecker multiplication nor Hadamard aka pointwise multiplication, and does not seem to have any built-in support in scipy.sparse.

所需的操作:输出的每一行应包含两个输入矩阵中相应行的元素的每个乘积的结果.因此,从两个大小相同的矩阵开始,每个矩阵的维度 m n ,结果的维度为 m n ^ 2 .

The desired operation: Each row of the output should contain the results of every product of the elements of the corresponding rows in the two input matrices. So starting with two identically sized matrices, each with dimensions m by n, the result should have dimensions m by n^2.

它看起来像这样:

Python代码:

import scipy.sparse
A = scipy.sparse.csr_matrix(np.array([[1,2],[3,4]]))
B = scipy.sparse.csr_matrix(np.array([[0,5],[6,7]]))

# C is the desired product of A and B. It should look like:
C = scipy.sparse.csr_matrix(np.array([[0,5,0,10],[18,21,24,28]]))

执行此操作的一种好方法或有效方法是什么?我尝试过在stackoverflow以及其他地方进行此操作,到目前为止还算不上运气.到目前为止,听起来我最好的选择是在for循环中逐行进行操作,但是由于我的输入矩阵有几百万行和几千列,而大多数都是稀疏的,这听起来太可怕了.

What would be a nice or efficient way to do this? I've tried looking here on stackoverflow as well as elsewhere, with no luck so far. So far it sounds like my best bet is to do a row by row operation in a for loop, but that sounds horrendous seeing as my input matrices have a few million rows and few thousand columns, mostly sparse.

推荐答案

在您的示例中,Ckron

In [4]: A=np.array([[1,2],[3,4]])
In [5]: B=np.array([[0,5],[6,7]])
In [6]: np.kron(A,B)
Out[6]: 
array([[ 0,  5,  0, 10],
       [ 6,  7, 12, 14],
       [ 0, 15,  0, 20],
       [18, 21, 24, 28]])
In [7]: np.kron(A,B)[[0,3],:]
Out[7]: 
array([[ 0,  5,  0, 10],
       [18, 21, 24, 28]])

kron包含与np.outer相同的值,但它们的顺序不同.

kron contains the same values as np.outer, but they are in a different order.

对于大型密集阵列,einsum可能会提供良好的速度:

For large dense arrays, einsum might provide good speed:

np.einsum('ij,ik->ijk',A,B).reshape(A.shape[0],-1)

sparse.kronnp.kron的作用相同:

As = sparse.csr_matrix(A); Bs ...
sparse.kron(As,Bs).tocsr()[[0,3],:].A

sparse.kron是用Python编写的,因此,如果执行不必要的计算,您可能会对其进行修改.

sparse.kron is written in Python, so you probably could modify it if it is doing unnecessary calculations.

一个迭代解决方案似乎是:

An iterative solution appears to be:

sparse.vstack([sparse.kron(a,b) for a,b in zip(As,Bs)]).A

要迭代,我不希望它比缩减完整的kron更快.但是,如果不深入研究sparse.kron的逻辑,那可能是我所能做的最好的事情.

Being iterative I don't expect it to be faster than paring down the full kron. But short of digging into the logic of sparse.kron it is probably the best I can do.

vstack使用bmat,因此计算公式为:

vstack uses bmat, so the calculation is:

sparse.bmat([[sparse.kron(a,b)] for a,b in zip(As,Bs)])

但是bmat相当复杂,因此进一步简化它并不容易.

But bmat is rather complex, so it won't be easy to simplify this further.

np.einsum解决方案无法轻松扩展为稀疏-没有sparse.einsum,中间产品是3d,稀疏无法处理.

The np.einsum solution can't be easily extended to sparse - there isn't a sparse.einsum, and the intermediate product is 3d, which sparse does not handle.

sparse.kron使用coo格式,这不适用于处理行.但是本着该函数的精神,我设计了一个对csr格式矩阵的行进行迭代的函数.像kronbmat一样,我正在构造datarowcol数组,并从这些数组构造一个coo_matrix.进而可以将其转换为其他格式.

sparse.kron uses coo format, which is no good for working with the rows. But working in the spirit of that function, I've worked out a function that iterates on the rows of csr format matrices. Like kron and bmat I'm constructing the data, row, col arrays, and constructing a coo_matrix from those. That in turn can be converted to other formats.

def test_iter(A, B):
    m,n1 = A.shape
    n2 = B.shape[1]
    Cshape = (m, n1*n2)
    data = np.empty((m,),dtype=object)
    col =  np.empty((m,),dtype=object)
    row =  np.empty((m,),dtype=object)
    for i,(a,b) in enumerate(zip(A, B)):
        data[i] = np.outer(a.data, b.data).flatten()
        #col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
        col1 = a.indices * n2   # correction
        col[i] = (col1[:,None]+b.indices).flatten()
        row[i] = np.full((a.nnz*b.nnz,), i)
    data = np.concatenate(data)
    col = np.concatenate(col)
    row = np.concatenate(row)
    return sparse.coo_matrix((data,(row,col)),shape=Cshape)

使用这些较小的2x2矩阵和较大的矩阵(例如A1=sparse.rand(1000,2000).tocsr()),这比使用bmat的版本快约3倍.对于足够大的矩阵,它比密集的einsum版本(它可能存在内存错误)要好.

With these small 2x2 matrices, as well as larger ones (e.g. A1=sparse.rand(1000,2000).tocsr()), this is about 3x faster than the version using bmat. For large enough matrices it is better than dense einsum version (which can have memory errors).

这篇关于Python中2种稀疏矩阵的一种特殊的逐行乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆