如何以gzip压缩格式直接保存 pandas 数据框? [英] How to save a pandas dataframe in gzipped format directly?
问题描述
我有一个名为df
的熊猫数据框.
I have a pandas data frame, called df
.
我想将其保存为压缩格式.一种方法如下:
I want to save this in a gzipped format. One way to do this is the following:
import gzip
import pandas
df.save('filename.pickle')
f_in = open('filename.pickle', 'rb')
f_out = gzip.open('filename.pickle.gz', 'wb')
f_out.writelines(f_in)
f_in.close()
f_out.close()
但是,这需要我首先创建一个名为filename.pickle
的文件.
有没有一种方法可以更直接地执行此操作,即不创建filename.pickle
?
However, this requires me to first create a file called filename.pickle
.
Is there a way to do this more directly, i.e., without creating the filename.pickle
?
当我要加载已压缩的数据帧时,我必须经历相同的操作
创建filename.pickle的步骤.例如,读取文件
filename2.pickle.gzip
,这是一个压缩的熊猫数据框,我知道以下方法:
When I want to load the dataframe that has been gzipped I have to go through the same
step of creating filename.pickle. For example, to read a file
filename2.pickle.gzip
, which is a gzipped pandas dataframe, I know of the following method:
f_in = gzip.open('filename2.pickle.gz', 'rb')
f_out = gzip.open('filename2.pickle', 'wb')
f_out.writelines(f_in)
f_in.close()
f_out.close()
df2 = pandas.load('filename2.pickle')
可以不先创建filename2.pickle
来完成此操作吗?
Can this be done without creating filename2.pickle
first?
推荐答案
我们计划最终通过压缩添加更好的序列化.密切关注熊猫的发展
We plan to add better serialization with compression eventually. Stay tuned to pandas development
这篇关于如何以gzip压缩格式直接保存 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!