如何有效地将稀疏稀疏数组和numpy数组拆分为较小的N个不相等的块? [英] How to efficiently split scipy sparse and numpy arrays into smaller N unequal chunks?

查看:74
本文介绍了如何有效地将稀疏稀疏数组和numpy数组拆分为较小的N个不相等的块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在检查了文档这个问题:

>>>print(X.shape) 
(2399, 39999)

>>>print(type(X))
<class 'scipy.sparse.csr.csr_matrix'>

>>>print(X.toarray())

[[0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 ..., 
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]

然后:

new_array = np.split(X,3)

出局:

ValueError: array split does not result in an equal division

然后我尝试:

new_array = np.hsplit(X,3)

出局:

ValueError: bad axis1 argument to swapaxes

因此,如何将数组拆分为 N 个不同大小不等的块?

Thus, How can I split the array into N different unequal sized chunks?.

推荐答案

制作稀疏矩阵:

In [62]: M=(sparse.rand(10,3,.3,'csr')*10).astype(int)
In [63]: M
Out[63]: 
<10x3 sparse matrix of type '<class 'numpy.int32'>'
    with 9 stored elements in Compressed Sparse Row format>
In [64]: M.A
Out[64]: 
array([[0, 7, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 5],
       [0, 0, 2],
       [0, 0, 6],
       [0, 4, 4],
       [7, 1, 0],
       [0, 0, 2]])

密集等效项很容易拆分. array_split 处理不相等的块,但是您也可以按照其他答案中的说明拼写拆分.

The dense equivalent is easily split. array_split handles unequal chunks, but you can also spell out the split as illustrated in the other answer.

In [65]: np.array_split(M.A, 3)
Out[65]: 
[array([[0, 7, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]), array([[0, 0, 5],
        [0, 0, 2],
        [0, 0, 6]]), array([[0, 4, 4],
        [7, 1, 0],
        [0, 0, 2]])]

通常, numpy 函数不能直接在稀疏矩阵上运行.它们不是子类.除非函数将操作委托给数组自己的方法,否则该函数可能无法正常工作.函数通常以 np.asarray(M)开头,这与 M.toarray()不同(请自己尝试).

In general numpy functions cannot work directly on sparse matrices. They aren't a subclass. Unless the function delegates the action to the array's own method, the function probably won't work. Often the function starts with np.asarray(M), which is not the same as M.toarray() (try it yourself).

但是 split 只不过是沿所需的轴切片.我可以使用以下方法生成相同的4,2,3拆分:

But split is nothing more than slicing along the desired axis. I can produce the same 4,2,3 split with:

In [143]: alist = [M[0:4,:], M[4:7,:], M[7:10]]
In [144]: alist
Out[144]: 
[<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in Compressed Sparse Row format>,
 <3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 3 stored elements in Compressed Sparse Row format>,
 <3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>]
In [145]: [m.A for m in alist]
Out[145]: 
[array([[0, 7, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]], dtype=int32), array([[0, 0, 5],
        [0, 0, 2],
        [0, 0, 6]], dtype=int32), array([[0, 4, 4],
        [7, 1, 0],
        [0, 0, 2]], dtype=int32)]

其余为管理详细信息.

我应该补充说,稀疏切片永远不会成为视图.它们是具有自己的 data 属性的新稀疏矩阵.

I should add that sparse slices are never views. They are new sparse matrices with their own data attribute.

通过列表中的拆分索引,我们可以通过简单的迭代来构建拆分列表:

With the split indexes in a list, we can construct the split list with a simple iteration:

In [146]: idx = [0,4,7,10]
In [149]: alist = []
In [150]: for i in range(len(idx)-1):
     ...:     alist.append(M[idx[i]:idx[i+1]])   

尽管在 10 中很明显的起点,即 M.shape [0],但我还没有弄清楚如何构造 idx 的细节..

I haven't worked out the details of how to construct idx, though an obvious starting point in the 10, the M.shape[0].

对于均匀分割(适合)

In [160]: [M[i:i+5,:] for i in range(0,M.shape[0],5)]
Out[160]: 
[<5x3 sparse matrix of type '<class 'numpy.int32'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <5x3 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in Compressed Sparse Row format>]

这篇关于如何有效地将稀疏稀疏数组和numpy数组拆分为较小的N个不相等的块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆