从Google Cloud Storage将CSV文件读取到Datalab,然后转换为pandas数据框 [英] Read CSV file to Datalab from Google Cloud Storage and convert to pandas dataframe

查看:148
本文介绍了从Google Cloud Storage将CSV文件读取到Datalab,然后转换为pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取以gs格式保存的csv文件到数据框以进行分析

I am trying to read a csv file save in gs to a dataframe for analysis

我已按照以下步骤操作,但未成功

I have follow the following steps without success

mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
df = pd.read_csv(data_csv)

这不起作用,因为data_csv不是pd.read_csv预期的路径 我也尝试过

this doesn't work since data_csv is not a path as expected by pd.read_csv I also tried

%%gcs read --object $data_csv --variable data
#result: %gcs: error: unrecognized arguments: Cloud Storage Object gs://path/to/file.csv

我如何读取文件进行分析?

How can I read my file for analysis do this?

谢谢

推荐答案

%% gcs返回字节对象.要读取它,请使用来自io(python 3)的BytesIO

%%gcs returns bytes objects. To read it use BytesIO from io (python 3)

mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')

%%gcs read --object $data_csv --variable data

df = pd.read_csv(BytesIO(data_csv), sep = ';')

如果您的csv文件用逗号分隔,则无需指定< sep =','>这是默认值 在此处阅读有关io库和软件包的更多信息:用于处理流的核心工具

if your csv file is comma separated, no need to specify < sep = ',' > which is the default read more about io library and packages here: Core tools for working with streams

这篇关于从Google Cloud Storage将CSV文件读取到Datalab,然后转换为pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆