有效地将块状/稀疏和密集矩阵相乘 [英] Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently

查看：138 发布时间：2020/5/18 20:58:22 python performance numpy scipy sparse-matrix

本文介绍了有效地将块状/稀疏和密集矩阵相乘的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在努力实现以下等式:

I'm working to implement the following equation:

X =(Y.T * Y + Y.T * C * Y) ^ -1

Y是一个(n x f)矩阵，C是(n x n)个对角线1； n约为300k，f介于100到200之间.作为优化过程的一部分，该方程将被使用近1亿次，因此必须非常快地对其进行处理.

Y is a (n x f) matrix and C is (n x n) diagonal one; n is about 300k and f will vary between 100 and 200. As part of an optimization process this equation will be used almost 100 million times so it has to be processed really fast.

Y是随机初始化的，并且C是一个非常稀疏的矩阵，对角线300k中只有几个数字将不同于0.由于Numpy的对角线函数创建了密集矩阵，因此我将C创建为稀疏csr矩阵.但是，当尝试求解方程的第一部分时:

Y is initialized randomly and C is a very sparse matrix with only a few numbers out of the 300k on the diagonal will be different than 0.Since Numpy's diagonal functions creates dense matrices, I created C as a sparse csr matrix. But when trying to solve the first part of the equation:

r = dot(C, Y)

由于内存限制，计算机崩溃.我决定然后尝试将Y转换为csr_matrix并进行相同的操作:

The computer crashes due Memory limits. I decided then trying to convert Y to csr_matrix and make the same operation:

r = dot(C, Ysparse)

，此方法花费了 1.38毫秒.但是这种解决方案有些棘手"，因为我正在使用稀疏矩阵来存储密集的矩阵，我不知道这种方法的效率如何.

and this approach took 1.38 ms. But this solution is somewhat "tricky" since I'm using a sparse matrix to store a dense one, I wonder how efficient this really.

所以我的问题是，是否有某种方法可以将稀疏C和密集Y相乘而不必将Y变成稀疏并提高性能?如果以某种方式可以将C表示为对角线密集而不消耗大量内存，那么这可能会导致非常高效的性能，但是我不知道这是否可能.

So my question is if is there some way of multiplying the sparse C and the dense Y without having to turn Y into sparse and improve performance? If somehow C could be represented as diagonal dense without consuming tons of memory maybe this would lead to very efficient performance but I don't know if this is possible.

感谢您的帮助！

推荐答案

在计算r = dot(C，Y)时，点积进入内存问题的原因是因为numpy的点函数没有本机支持来处理稀疏矩阵.发生的事情是numpy将稀疏矩阵C视为python对象，而不是numpy数组.如果您进行小规模检查，则可以直接看到问题:

The reason the dot product runs into memory issues when computing r = dot(C,Y) is because numpy's dot function does not have native support for handling sparse matrices. What is happening is numpy thinks of the sparse matrix C as a python object, and not a numpy array. If you inspect on small scale you can see the problem first hand:

>>> from numpy import dot, array
>>> from scipy import sparse
>>> Y = array([[1,2],[3,4]])
>>> C = sparse.csr_matrix(array([[1,0], [0,2]]))
>>> dot(C,Y)
array([[  (0, 0)    1
  (1, 1)    2,   (0, 0) 2
  (1, 1)    4],
  [  (0, 0) 3
  (1, 1)    6,   (0, 0) 4
  (1, 1)    8]], dtype=object)

上面显然不是您感兴趣的结果.您要做的是使用scipy的sparse.csr_matrix.dot函数进行计算:

Clearly the above is not the result you are interested in. Instead what you want to do is compute using scipy's sparse.csr_matrix.dot function:

r = sparse.csr_matrix.dot(C, Y)

或更紧凑

r = C.dot(Y)

这篇关于有效地将块状/稀疏和密集矩阵相乘的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有效地将块状/稀疏和密集矩阵相乘 [英] Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有效地将块状/稀疏和密集矩阵相乘 [英] Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭