将Pandas DataFrame转换为&来自记忆中的羽毛 [英] Convert Pandas DataFrame to & from In-Memory Feather

查看:295
本文介绍了将Pandas DataFrame转换为&来自记忆中的羽毛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在熊猫中使用 IO工具可以将DataFrame转换为内存中的羽毛缓冲区:

Using the IO tools in pandas it is possible to convert a DataFrame to an in-memory feather buffer:

import pandas as pd  
from io import BytesIO 

df = pd.DataFrame({'a': [1,2], 'b': [3.0,4.0]})  

buf = BytesIO()

df.to_feather(buf)

但是,使用相同的缓冲区将其转换回DataFrame

However, using the same buffer to convert back to a DataFrame

pd.read_feather(buf)

导致错误:

ArrowInvalid:不是羽毛文件

ArrowInvalid: Not a feather file

如何将DataFrame转换为内存中的羽毛表示形式,并相应地转换回DataFrame?

预先感谢您的考虑和答复.

Thank you in advance for your consideration and response.

推荐答案

使用pandas==0.25.2可以通过以下方式实现:

With pandas==0.25.2 this can be accomplished in the following way:

import pandas
import io
df = pandas.DataFrame(data={'a': [1, 2], 'b': [3.0, 4.0]})
buf = io.BytesIO()
df.to_feather(buf)
output = pandas.read_feather(buf)

然后调用output.head(2)返回:

    a    b
 0  1  3.0
 1  2  4.0


如果您的DataFrame具有多个索引,则可能会看到类似


If you have a DataFrame with multiple indexes, you may see an error like

ValueError:feather不支持索引的序列化;您可以.reset_index()使索引进入列

ValueError: feather does not support serializing for the index; you can .reset_index()to make the index into column(s)

在这种情况下,您需要在to_feather之前调用.reset_index(),在read_feather

In which case you need to call .reset_index() before to_feather, and call .set_index([...]) after read_feather

最后我想补充的是,如果您正在使用BytesIO进行操作,则需要在写入羽化字节后将其找回0.例如:

Last thing I would like to add, is that if you are doing something with the BytesIO, you need to seek back to 0 after writing the feather bytes. For example:

buffer = io.BytesIO()
df.reset_index(drop=False).to_feather(buffer)
buffer.seek(0)
s3_client.put_object(Body=buffer, Bucket='bucket', Key='file')

这篇关于将Pandas DataFrame转换为&来自记忆中的羽毛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆