NumPy矩阵到SciPy稀疏矩阵:添加标量的最安全方法是什么? [英] NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

查看:127
本文介绍了NumPy矩阵到SciPy稀疏矩阵:添加标量的最安全方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我不是数学家.我承认.但是,我仍然需要了解ScyPy的稀疏矩阵是如何算术工作的,以便在我必须处理的应用程序中从密集的NumPy矩阵切换到SciPy稀疏矩阵.问题是内存使用率.大的密集矩阵将消耗大量内存.

有争议的公式部分是将矩阵添加到标量的地方.

A = V + x

其中V是一个方形矩阵(它的大尺寸,例如60,000 x 60,000),且分布稀疏. x是浮点数.

使用NumPy进行的操作(如果我没记错的话)会将x添加到V中的每个字段中.请让我知道我是否完全偏离基准,并且x仅会添加到V中的非零值中.

对于SciPy,并非所有的稀疏矩阵都支持相同的功能,例如标量加法. dok_matrix(键字典)支持标量加法,但是(实际上)它正在分配每个矩阵项,从而有效地将我的稀疏dok_matrix渲染为具有更多开销的密集矩阵. (不好)

其他矩阵类型(CSR,CSC,LIL)不支持标量加法.

我可以尝试构造一个标量值为x的完整矩阵,然后将其添加到V.矩阵类型似乎都没有问题,因为它们似乎都支持矩阵加法.但是,我必须吃掉很多内存才能将x构造为矩阵,并且相加的结果也可能最终是完全填充的矩阵.

必须有一种替代方法,不需要分配100%的稀疏矩阵.

我将接受需要大量内存的想法,但是我想我会先寻求一些建议.谢谢.

解决方案

稀疏矩阵确实不是我的操盘手,但是ISTM最好的前进方式取决于矩阵类型.如果您确定:

>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,  10.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,  20.,   0.,   0.,   0.]])

然后您可以更新:

>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,  109.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,  119.,    0.,    0.,    0.]])

不是特别有效,但为O(非零).

OTOH,如果您有COO,CSC或CSR之类的内容,则可以直接修改data属性:

>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119.,  109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,  1109.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,  1119.,     0.,     0.,     0.]])

请注意,您可能要添加其他

>>> C.eliminate_zeros()

处理您添加负数的可能性,因此实际上正在记录0.就其本身而言,这应该可以正常工作,但是在 next 下一次您执行C.data += some_number技巧时,会将somenumber添加到您引入的零.

First off, I'm no mathmatician. I admit that. Yet I still need to understand how ScyPy's sparse matrices work arithmetically in order to switch from a dense NumPy matrix to a SciPy sparse matrix in an application I have to work on. The issue is memory usage. A large dense matrix will consume tons of memory.

The formula portion at issue is where a matrix is added to a scalar.

A = V + x

Where V is a square matrix (its large, say 60,000 x 60,000) and sparsely populated. x is a float.

The operation with NumPy will (if I'm not mistaken) add x to each field in V. Please let me know if I'm completely off base, and x will only be added to non-zero values in V.

With a SciPy, not all sparse matrices support the same features, like scalar addition. dok_matrix (Dictionary of Keys) supports scalar addition, but it looks like (in practice) that it's allocating each matrix entry, effectively rendering my sparse dok_matrix as a dense matrix with more overhead. (not good)

The other matrix types (CSR, CSC, LIL) don't support scalar addition.

I could try constructing a full matrix with the scalar value x, then adding that to V. I would have no problems with matrix types as they all seem to support matrix addition. However I would have to eat up a lot of memory to construct x as a matrix, and the result of the addition could end up being fully populated matrix as well.

There must be an alternative way to do this that doesn't require allocating 100% of a sparse matrix.

I'm will to accept that large amounts of memory are needed, but I thought I would seek some advice first. Thanks.

解决方案

Admittedly sparse matrices aren't really in my wheelhouse, but ISTM the best way forward depends on the matrix type. If you're DOK:

>>> S = dok_matrix((5,5))
>>> S[2,3] = 10; S[4,1] = 20
>>> S.todense()
matrix([[  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,   0.,   0.,  10.,   0.],
        [  0.,   0.,   0.,   0.,   0.],
        [  0.,  20.,   0.,   0.,   0.]])

Then you could update:

>>> S.update(zip(S.keys(), np.array(S.values()) + 99))
>>> S
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in Dictionary Of Keys format>
>>> S.todense()
matrix([[   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,    0.,    0.,  109.,    0.],
        [   0.,    0.,    0.,    0.,    0.],
        [   0.,  119.,    0.,    0.,    0.]])

Not particularly performant, but is O(nonzero).

OTOH, if you have something like COO, CSC, or CSR, you can modify the data attribute directly:

>>> C = S.tocoo()
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.data
array([ 119.,  109.])
>>> C.data += 1000
>>> C
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
>>> C.todense()
matrix([[    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,  1109.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,  1119.,     0.,     0.,     0.]])

Note that you're probably going to want to add an additional

>>> C.eliminate_zeros()

to handle the possibility that you've added a negative number and so there's now a 0 which is actually being recorded. By itself, that should work fine, but the next time you did the C.data += some_number trick, it would add somenumber to that zero you introduced.

这篇关于NumPy矩阵到SciPy稀疏矩阵:添加标量的最安全方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆