有效地使用python生成器创建scipy.lil_matrix [英] creating a scipy.lil_matrix using a python generator efficiently

查看:209
本文介绍了有效地使用python生成器创建scipy.lil_matrix的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个生成器,可以生成相同长度的单个尺寸numpy.array.我想要一个包含该数据的稀疏矩阵.行的生成顺序与我希望将它们放入最终矩阵的顺序相同. csr矩阵优于lil矩阵,但我认为后者在我描述的场景中将更易于构建.

I have a generator that generates single dimension numpy.arrays of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr matrix is preferable over lil matrix, but I assume the latter will be easier to build in the scenario I'm describing.

假设row_gen是产生numpy.array行的生成器,则以下代码按预期工作.

Assuming row_gen is a generator yielding numpy.array rows, the following code works as expected.

def row_gen():
    yield numpy.array([1, 2, 3])
    yield numpy.array([1, 0, 1])
    yield numpy.array([1, 0, 0])

matrix = scipy.sparse.lil_matrix(list(row_gen()))

因为该列表实质上会破坏生成器的任何优势,所以我希望以下内容具有相同的最终结果.更具体地说,我无法在内存中保存整个密集矩阵(或所有矩阵行的列表):

Because the list will essentially ruin any advantages of the generator, I'd like the following to have the same end result. More specifically, I cannot hold the entire dense matrix (or a list of all matrix rows) in memory:

def row_gen():
    yield numpy.array([1, 2, 3])
    yield numpy.array([1, 0, 1])
    yield numpy.array([1, 0, 0])

matrix = scipy.sparse.lil_matrix(row_gen())

但是在运行时会引发以下异常:

However it raises the following exception when run:

TypeError: no supported conversion for types: (dtype('O'),)

我还注意到跟踪包含以下内容:

I also noticed the trace includes the following:

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__
  A = csr_matrix(A, dtype=dtype).tolil()

这让我认为使用scipy.sparse.lil_matrix最终会创建一个csr矩阵,然后才将其转换为lil矩阵.在那种情况下,我宁愿只创建csr矩阵开始.

Which makes me think using scipy.sparse.lil_matrix will end up creating a csr matrix and only then convert that to a lil matrix. In that case I would rather just create csr matrix to begin with.

回顾一下,我的问题是:从python生成器或numpy一维数组创建scipy.sparse矩阵的最有效方法是什么?

To recap, my question is: What is the most efficient way to create a scipy.sparse matrix from a python generator or numpy single dimensional arrays?

推荐答案

让我们看一下sparse.lil_matrix的代码.它检查第一个参数:

Let's look at the code for sparse.lil_matrix. It checks the first argument:

if isspmatrix(arg1):    # is is already a sparse matrix
     ...
elif isinstance(arg1,tuple):    # is it the shape tuple
    if isshape(arg1):
        if shape is not None:
            raise ValueError('invalid use of shape parameter')
        M, N = arg1
        self.shape = (M,N)
        self.rows = np.empty((M,), dtype=object)
        self.data = np.empty((M,), dtype=object)
        for i in range(M):
            self.rows[i] = []
            self.data[i] = []
    else:
        raise TypeError('unrecognized lil_matrix constructor usage')
else:
    # assume A is dense
    try:
        A = np.asmatrix(arg1)
    except TypeError:
        raise TypeError('unsupported matrix type')
    else:
        from .csr import csr_matrix
        A = csr_matrix(A, dtype=dtype).tolil()

        self.shape = A.shape
        self.dtype = A.dtype
        self.rows = A.rows
        self.data = A.data

根据文档-您可以从另一个稀疏矩阵,形状和密集数组构造它.密集数组构造函数首先创建一个csr矩阵,然后将其转换为lil.

As per the documentation - you can construct it from another sparse matrix, from a shape, and from a dense array. The dense array constructor first makes a csr matrix, and then converts it to lil.

shape版本使用以下数据构造一个空的lil:

The shape version constructs an empty lil with data like:

In [161]: M=sparse.lil_matrix((3,5),dtype=int)
In [163]: M.data
Out[163]: array([[], [], []], dtype=object)
In [164]: M.rows
Out[164]: array([[], [], []], dtype=object)

很明显,传递生成器不会起作用-它不是密集数组.

It should be obvious that passing a generator isn't going work - it isn't a dense array.

但是创建了lil矩阵之后,您可以使用常规的数组分配来填充元素:

But having created a lil matrix, you can fill in elements with a regular array assignment:

In [167]: M[0,:]=[1,0,2,0,0]
In [168]: M[1,:]=[0,0,2,0,0]
In [169]: M[2,3:]=[1,1]
In [170]: M.data
Out[170]: array([[1, 2], [2], [1, 1]], dtype=object)
In [171]: M.rows
Out[171]: array([[0, 2], [2], [3, 4]], dtype=object)
In [172]: M.A
Out[172]: 
array([[1, 0, 2, 0, 0],
       [0, 0, 2, 0, 0],
       [0, 0, 0, 1, 1]])

,您可以直接将值分配给子列表(我认为这更快,但更危险):

and you can assign values to the sublists directly (I think this is faster, but a little more dangerous):

In [173]: M.data[1]=[1,2,3]
In [174]: M.rows[1]=[0,2,4]
In [176]: M.A
Out[176]: 
array([[1, 0, 2, 0, 0],
       [1, 0, 2, 0, 3],
       [0, 0, 0, 1, 1]])

另一种增量方法是构造3个coo格式的数组或列表,然后从中组成一个coocsr.

Another incremental approach is to construct the 3 arrays or lists of coo format, and then make a coo or csr from those.

sparse.bmat是另一个选项,它的代码是构建coo输入的一个很好的例子.我让你自己看看.

sparse.bmat is another option, and its code is a good example of building the coo inputs. I'll let you look at that yourself.

这篇关于有效地使用python生成器创建scipy.lil_matrix的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆