如何有效地将稀疏稀疏数组和numpy数组拆分为较小的N个不相等的块? [英] How to efficiently split scipy sparse and numpy arrays into smaller N unequal chunks?
问题描述
在检查了文档和
>>>print(X.shape)
(2399, 39999)
>>>print(type(X))
<class 'scipy.sparse.csr.csr_matrix'>
>>>print(X.toarray())
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
...,
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
然后:
new_array = np.split(X,3)
出局:
ValueError: array split does not result in an equal division
然后我尝试:
new_array = np.hsplit(X,3)
出局:
ValueError: bad axis1 argument to swapaxes
因此,如何将数组拆分为 N
个不同大小不等的块?
Thus, How can I split the array into N
different unequal sized chunks?.
推荐答案
制作稀疏矩阵:
In [62]: M=(sparse.rand(10,3,.3,'csr')*10).astype(int)
In [63]: M
Out[63]:
<10x3 sparse matrix of type '<class 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format>
In [64]: M.A
Out[64]:
array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 5],
[0, 0, 2],
[0, 0, 6],
[0, 4, 4],
[7, 1, 0],
[0, 0, 2]])
密集等效项很容易拆分. array_split
处理不相等的块,但是您也可以按照其他答案中的说明拼写拆分.
The dense equivalent is easily split. array_split
handles unequal chunks, but you can also spell out the split as illustrated in the other answer.
In [65]: np.array_split(M.A, 3)
Out[65]:
[array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]), array([[0, 0, 5],
[0, 0, 2],
[0, 0, 6]]), array([[0, 4, 4],
[7, 1, 0],
[0, 0, 2]])]
通常, numpy
函数不能直接在稀疏矩阵上运行.它们不是子类.除非函数将操作委托给数组自己的方法,否则该函数可能无法正常工作.函数通常以 np.asarray(M)
开头,这与 M.toarray()
不同(请自己尝试).
In general numpy
functions cannot work directly on sparse matrices. They aren't a subclass. Unless the function delegates the action to the array's own method, the function probably won't work. Often the function starts with np.asarray(M)
, which is not the same as M.toarray()
(try it yourself).
但是 split
只不过是沿所需的轴切片.我可以使用以下方法生成相同的4,2,3拆分:
But split
is nothing more than slicing along the desired axis. I can produce the same 4,2,3 split with:
In [143]: alist = [M[0:4,:], M[4:7,:], M[7:10]]
In [144]: alist
Out[144]:
[<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 1 stored elements in Compressed Sparse Row format>,
<3x3 sparse matrix of type '<class 'numpy.int32'>'
with 3 stored elements in Compressed Sparse Row format>,
<3x3 sparse matrix of type '<class 'numpy.int32'>'
with 5 stored elements in Compressed Sparse Row format>]
In [145]: [m.A for m in alist]
Out[145]:
[array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int32), array([[0, 0, 5],
[0, 0, 2],
[0, 0, 6]], dtype=int32), array([[0, 4, 4],
[7, 1, 0],
[0, 0, 2]], dtype=int32)]
其余为管理详细信息.
我应该补充说,稀疏切片永远不会成为视图.它们是具有自己的 data
属性的新稀疏矩阵.
I should add that sparse slices are never views. They are new sparse matrices with their own data
attribute.
通过列表中的拆分索引,我们可以通过简单的迭代来构建拆分列表:
With the split indexes in a list, we can construct the split list with a simple iteration:
In [146]: idx = [0,4,7,10]
In [149]: alist = []
In [150]: for i in range(len(idx)-1):
...: alist.append(M[idx[i]:idx[i+1]])
尽管在 10
中很明显的起点,即 M.shape [0],但我还没有弄清楚如何构造
. idx
的细节.
I haven't worked out the details of how to construct idx
, though an obvious starting point in the 10
, the M.shape[0]
.
对于均匀分割(适合)
In [160]: [M[i:i+5,:] for i in range(0,M.shape[0],5)]
Out[160]:
[<5x3 sparse matrix of type '<class 'numpy.int32'>'
with 2 stored elements in Compressed Sparse Row format>,
<5x3 sparse matrix of type '<class 'numpy.int32'>'
with 7 stored elements in Compressed Sparse Row format>]
这篇关于如何有效地将稀疏稀疏数组和numpy数组拆分为较小的N个不相等的块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!