从Google Cloud Storage将CSV文件读取到Datalab,然后转换为pandas数据框 [英] Read CSV file to Datalab from Google Cloud Storage and convert to pandas dataframe
问题描述
我正在尝试读取以gs格式保存的csv文件到数据框以进行分析
I am trying to read a csv file save in gs to a dataframe for analysis
我已按照以下步骤操作,但未成功
I have follow the following steps without success
mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
df = pd.read_csv(data_csv)
这不起作用,因为data_csv不是pd.read_csv预期的路径 我也尝试过
this doesn't work since data_csv is not a path as expected by pd.read_csv I also tried
%%gcs read --object $data_csv --variable data
#result: %gcs: error: unrecognized arguments: Cloud Storage Object gs://path/to/file.csv
我如何读取文件进行分析?
How can I read my file for analysis do this?
谢谢
推荐答案
%% gcs返回字节对象.要读取它,请使用来自io(python 3)的BytesIO
%%gcs returns bytes objects. To read it use BytesIO from io (python 3)
mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
%%gcs read --object $data_csv --variable data
df = pd.read_csv(BytesIO(data_csv), sep = ';')
如果您的csv文件用逗号分隔,则无需指定< sep =','>这是默认值 在此处阅读有关io库和软件包的更多信息:用于处理流的核心工具
if your csv file is comma separated, no need to specify < sep = ',' > which is the default read more about io library and packages here: Core tools for working with streams
这篇关于从Google Cloud Storage将CSV文件读取到Datalab,然后转换为pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!