从 SciPy 稀疏矩阵填充 Pandas SparseDataFrame [英] Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix
本文介绍了从 SciPy 稀疏矩阵填充 Pandas SparseDataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我注意到 Pandas 现在支持稀疏矩阵和数组.目前,我创建 DataFrame()
是这样的:
I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()
s like this:
return DataFrame(matrix.toarray(), columns=features, index=observations)
有没有办法用 scipy.sparse.csc_matrix()
或 csr_matrix()
创建 SparseDataFrame()
?转换为密集格式会严重破坏 RAM.谢谢!
Is there a way to create a SparseDataFrame()
with a scipy.sparse.csc_matrix()
or csr_matrix()
? Converting to dense format kills RAM badly. Thanks!
推荐答案
ATM 不支持直接转换.欢迎投稿!
A direct conversion is not supported ATM. Contributions are welcome!
试试这个,内存应该没问题,因为 SpareSeries 很像 csc_matrix(用于 1 列)并且非常节省空间
Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient
In [37]: col = np.array([0,0,1,2,2,2])
In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')
In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )
In [40]: m
Out[40]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Column format>
In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
Out[46]:
0 1 2
0 1 0 4
1 0 0 5
2 2 3 6
In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
In [48]: type(df)
Out[48]: pandas.sparse.frame.SparseDataFrame
这篇关于从 SciPy 稀疏矩阵填充 Pandas SparseDataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文