NumPy矩阵到SciPy稀疏矩阵:添加标量的最安全方法是什么? [英] NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?
问题描述
首先,我不是数学家.我承认.但是,我仍然需要了解ScyPy的稀疏矩阵是如何算术工作的,以便在我必须处理的应用程序中从密集的NumPy矩阵切换到SciPy稀疏矩阵.问题是内存使用率.大的密集矩阵将消耗大量内存.
有争议的公式部分是将矩阵添加到标量的地方.
A = V + x
其中V是一个方形矩阵(它的大尺寸,例如60,000 x 60,000),且分布稀疏. x是浮点数.
使用NumPy进行的操作(如果我没记错的话)会将x添加到V中的每个字段中.请让我知道我是否完全偏离基准,并且x仅会添加到V中的非零值中.
对于SciPy,并非所有的稀疏矩阵都支持相同的功能,例如标量加法. dok_matrix(键字典)支持标量加法,但是(实际上)它正在分配每个矩阵项,从而有效地将我的稀疏dok_matrix渲染为具有更多开销的密集矩阵. (不好)
其他矩阵类型(CSR,CSC,LIL)不支持标量加法.
我可以尝试构造一个标量值为x的完整矩阵,然后将其添加到V.矩阵类型似乎都没有问题,因为它们似乎都支持矩阵加法.但是,我必须吃掉很多内存才能将x构造为矩阵,并且相加的结果也可能最终是完全填充的矩阵.
必须有一种替代方法,不需要分配100%的稀疏矩阵.
我将接受需要大量内存的想法,但是我想我会先寻求一些建议.谢谢.
稀疏矩阵确实不是我的操盘手,但是ISTM最好的前进方式取决于矩阵类型.如果您确定:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
然后您可以更新:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
不是特别有效,但为O(非零).
OTOH,如果您有COO,CSC或CSR之类的内容,则可以直接修改data
属性:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
请注意,您可能要添加其他
>>> C.eliminate_zeros()
处理您添加负数的可能性,因此实际上正在记录0
.就其本身而言,这应该可以正常工作,但是在 next 下一次您执行C.data += some_number
技巧时,会将somenumber
添加到您引入的零.
First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.
The formula portion at issue is where a matrix is added to a scalar.
A = V + x
Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.
The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.
With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)
The other matrix types (CSR, CSC, LIL) don't support scalar addition.
I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.
There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.
I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.
Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:
>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 20., 0., 0., 0.]])
Then you could update:
>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 119., 0., 0., 0.]])
Not particularly performant, but is O(nonzero).
OTOH, if you have something like COO, CSC, or CSR, you can modify the data
attribute directly:
>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119., 109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1109., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 1119., 0., 0., 0.]])
Note that you're probably going to want to add an additional
>>> C.eliminate_zeros()
to handle the possibility that you've added a negative number and so there's now a 0
which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number
trick, it would add somenumber
to that zero you introduced.
这篇关于NumPy矩阵到SciPy稀疏矩阵:添加标量的最安全方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!