pandas 稀疏dataFrame到稀疏矩阵,而不在内存中生成密集矩阵 [英] Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory
问题描述
是否有一种方法可以将pandas.SparseDataFrame
转换为scipy.sparse.csr_matrix
,而不会在内存中生成密集矩阵?
Is there a way to convert from a pandas.SparseDataFrame
to scipy.sparse.csr_matrix
, without generating a dense matrix in memory?
scipy.sparse.csr_matrix(df.values)
不起作用,因为它会生成一个密集矩阵,该矩阵被强制转换为csr_matrix
.
doesn't work as it generates a dense matrix which is cast to the csr_matrix
.
提前谢谢!
推荐答案
熊猫文档讨论了对稀疏稀疏的实验性转换SparseSeries.to_coo:
Pandas docs talks about an experimental conversion to scipy sparse, SparseSeries.to_coo:
http://pandas -docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse
================
================
edit-这是multiindex的特殊功能,而不是数据框.参见其他答案.注意日期的差异.
edit - this is a special function from a multiindex, not a data frame. See the other answers for that. Note the difference in dates.
============
============
从0.20.0开始,有一个sdf.to_coo()
和一个多索引ss.to_coo()
.由于稀疏矩阵本质上是2d的,因此对于(有效)1d数据序列要求多索引是有意义的.虽然数据框可以表示一个表或2d数组.
As of 0.20.0, there is a sdf.to_coo()
and a multiindex ss.to_coo()
. Since a sparse matrix is inherently 2d, it makes sense to require multiindex for the (effectively) 1d dataseries. While the dataframe can represent a table or 2d array.
当我第一次回答这个问题时,这个稀疏的数据框/系列功能是实验性的(2015年6月).
When I first responded to this question this sparse dataframe/series feature was experimental (june 2015).
这篇关于 pandas 稀疏dataFrame到稀疏矩阵,而不在内存中生成密集矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!