将pandas数据框以压缩CSV格式直接写到Amazon s3存储桶中吗? [英] Write pandas dataframe as compressed CSV directly to Amazon s3 bucket?

查看:61
本文介绍了将pandas数据框以压缩CSV格式直接写到Amazon s3存储桶中吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个脚本,该脚本读取保存到s3的csv的现有版本,并将其与pandas数据框中的新行合并,然后将其直接写回到s3.

I currently have a script that reads the existing version of a csv saved to s3, combines that with the new rows in the pandas dataframe, and then writes that directly back to s3.

    try:
        csv_prev_content = str(s3_resource.Object('bucket-name', ticker_csv_file_name).get()['Body'].read(), 'utf8')
    except:
        csv_prev_content = ''

    csv_output = csv_prev_content + curr_df.to_csv(path_or_buf=None, header=False)
    s3_resource.Object('bucket-name', ticker_csv_file_name).put(Body=csv_output)

除了gzip压缩的csv之外,有没有其他方法可以做到这一点?我想读取s3上现有的.gz压缩csv(如果有的话),将其与数据帧的内容连接起来,然后直接在s3 中将新的组合压缩csv覆盖.gz,而无需制作本地副本.

Is there a way that I can do this but with a gzip compressed csv? I want to read an existing .gz compressed csv on s3 if there is one, concatenate it with the contents of the dataframe, and then overwrite the .gz with the new combined compressed csv directly in s3 without having to make a local copy.

推荐答案

这是Python 3.5.2中使用Pandas 0.20.1的解决方案.

Here's a solution in Python 3.5.2 using Pandas 0.20.1.

可以从S3,本地CSV或任何其他内容读取源DataFrame.

The source DataFrame can be read from a S3, a local CSV, or whatever.

import boto3
import gzip
import pandas as pd
from io import BytesIO, TextIOWrapper

df = pd.read_csv('s3://ramey/test.csv')
gz_buffer = BytesIO()

with gzip.GzipFile(mode='w', fileobj=gz_buffer) as gz_file:
    df.to_csv(TextIOWrapper(gz_file, 'utf8'), index=False)

s3_resource = boto3.resource('s3')
s3_object = s3_resource.Object('ramey', 'new-file.csv.gz')
s3_object.put(Body=gz_buffer.getvalue())

这篇关于将pandas数据框以压缩CSV格式直接写到Amazon s3存储桶中吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆