如何创建相互作用的稀疏矩阵? [英] How do i create interacting sparse matrix?

查看:107
本文介绍了如何创建相互作用的稀疏矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个稀疏矩阵:

Suppose I have two sparse matrix:

from scipy.sparse import random
from scipy import stats

S0 = random(5000,100, density=0.01)
S1 = random(5000,100,density=0.01)

我想创建一个形状为(5000,100 * 100)的稀疏矩阵S2. (在我的实际应用中,"5000"应为2000万).对于每一行,这是这两个100维向量内的某种交互.

I want to create a sparse matrix S2, where the shape is (5000,100*100). (in my real application, this '5000' should be 20 million). For each row, it is some kind of interaction within this two 100 dimension vector.

S2 =  some_kind_of_tensor_multiplication(S0 ,S1 )

为了说明S2 [i,j] = S0 [i,k0] * S1 [i,k1],我们对[0,99]中的所有k0,k1进行迭代以创建长度为10000的第i行.找不到任何有效的方法来实现这一目标.有人可以帮忙吗?

To illustrate S2[i,j] = S0[i,k0] * S1[i,k1], we iterate over all k0,k1 from [0,99] to create this ith row of length 10000. I could not find any efficient method to achieve this. Could anyone help?

效率低下的方法看起来像,但是我认为这会非常低效...:

The inefficient method looks like, but i think this would be very inefficient...:

result=[]
for i in range(S0.shape[1]):
    for j in range(S1.shape[1]):
        result.append(S0[:,i]*S1[:,j])
result = np.vstack(result).T

以下类似问题: 特殊行Python中两个稀疏矩阵的行累加

我尝试过:

import numpy as np

from scipy.sparse import random
from scipy import stats
from scipy import sparse

S0 = random(20000000,100, density=0.01).tocsr()
S1 = random(20000000,100,density=0.01).tocsr()


def test_iter(A, B):
    m,n1 = A.shape
    n2 = B.shape[1]
    Cshape = (m, n1*n2)
    data = np.empty((m,),dtype=object)
    col =  np.empty((m,),dtype=object)
    row =  np.empty((m,),dtype=object)
    for i,(a,b) in enumerate(zip(A, B)):
        data[i] = np.outer(a.data, b.data).flatten()
        #col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
        col1 = a.indices * n2   # correction
        col[i] = (col1[:,None]+b.indices).flatten()
        row[i] = np.full((a.nnz*b.nnz,), i)
    data = np.concatenate(data)
    col = np.concatenate(col)
    row = np.concatenate(row)
    return sparse.coo_matrix((data,(row,col)),shape=Cshape)

尝试:

%%time
S_result = test_iter(S0,S1)

需要Wall时间:53分钟8秒.谢谢,我们有更快的方案吗?

It takes Wall time: 53min 8s . Do we have any faster scheme, Thanks?

推荐答案

这里是重写,可以直接与csr intptr一起使用.通过直接切片dataindices而不是每行制作一个全新的1行csr矩阵,可以节省时间:

Here's a rewrite, working directly with the csr intptr. It save time by slicing the data and indices directly, rather than making a whole new 1 row csr matrix each row:

def test_iter2(A, B): 
    m,n1 = A.shape 
    n2 = B.shape[1] 
    Cshape = (m, n1*n2) 
    data = [] 
    col =  [] 
    row =  [] 
    for i in range(A.shape[0]): 
        slc1 = slice(A.indptr[i],A.indptr[i+1]) 
        data1 = A.data[slc1]; ind1 = A.indices[slc1] 
        slc2 = slice(B.indptr[i],B.indptr[i+1])  
        data2 = B.data[slc2]; ind2 = B.indices[slc2]  
        data.append(np.outer(data1, data2).ravel()) 
        col.append(((ind1*n2)[:,None]+ind2).ravel()) 
        row.append(np.full(len(data1)*len(data2), i)) 
    data = np.concatenate(data) 
    col = np.concatenate(col) 
    row = np.concatenate(row) 
    return sparse.coo_matrix((data,(row,col)),shape=Cshape) 

使用较小的测试用例,可以节省大量时间:

With a smaller test case, this saves quite a bit of time:

In [536]: S0=sparse.random(200,200, 0.01, format='csr')                                                   
In [537]: S1=sparse.random(200,200, 0.01, format='csr')                                                   
In [538]: timeit test_iter(S0,S1)                                                                         
42.8 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [539]: timeit test_iter2(S0,S1)                                                                        
6.94 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这篇关于如何创建相互作用的稀疏矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆