从SciPy稀疏矩阵填充Pandas SparseDataFrame [英] Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix

查看:263
本文介绍了从SciPy稀疏矩阵填充Pandas SparseDataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到Pandas现在具有支持稀疏矩阵和数组.目前,我这样创建DataFrame():

I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()s like this:

return DataFrame(matrix.toarray(), columns=features, index=observations)

是否可以用scipy.sparse.csc_matrix()csr_matrix()创建SparseDataFrame()?转换为密集格式会严重破坏RAM.谢谢!

Is there a way to create a SparseDataFrame() with a scipy.sparse.csc_matrix() or csr_matrix()? Converting to dense format kills RAM badly. Thanks!

推荐答案

ATM不支持直接转换.欢迎捐款!

A direct conversion is not supported ATM. Contributions are welcome!

尝试一下,在内存上应该没问题,因为SpareSeries很像csc_matrix(适用于1列) 且节省空间

Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient

In [37]: col = np.array([0,0,1,2,2,2])

In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')

In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )

In [40]: m
Out[40]: 
<3x3 sparse matrix of type '<type 'numpy.float64'>'
        with 6 stored elements in Compressed Sparse Column format>

In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                              for i in np.arange(m.shape[0]) ])
Out[46]: 
   0  1  2
0  1  0  4
1  0  0  5
2  2  3  6

In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                                   for i in np.arange(m.shape[0]) ])

In [48]: type(df)
Out[48]: pandas.sparse.frame.SparseDataFrame

这篇关于从SciPy稀疏矩阵填充Pandas SparseDataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆