如何在不写入磁盘的情况下将 AWS S3 上的文本文件导入 Pandas [英] How to import a text file on AWS S3 into pandas without writing to disk

查看：33 发布时间：2021/11/27 10:47:20 python pandas heroku amazon-s3 boto3

本文介绍了如何在不写入磁盘的情况下将 AWS S3 上的文本文件导入 Pandas的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 S3 上保存了一个文本文件，它是一个制表符分隔的表格.我想将它加载到 Pandas 中，但无法先保存它，因为我在 Heroku 服务器上运行.这是我目前所拥有的.

I have a text file saved on S3 which is a tab delimited table. I want to load it into pandas but cannot save it first because I am running on a heroku server. Here is what I have so far.

import io
import boto3
import os
import pandas as pd

os.environ["AWS_ACCESS_KEY_ID"] = "xxxxxxxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxxxxxxx"

s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket="my_bucket",Key="filename.txt")
file = response["Body"]


pd.read_csv(file, header=14, delimiter="	", low_memory=False)

错误是

OSError: Expected file path name or file-like object, got <class 'bytes'> type

如何将响应正文转换为 Pandas 可接受的格式?

How do I convert the response body into a format pandas will accept?

pd.read_csv(io.StringIO(file), header=14, delimiter="	", low_memory=False)

returns

TypeError: initial_value must be str or None, not StreamingBody

pd.read_csv(io.BytesIO(file), header=14, delimiter="	", low_memory=False)

returns

TypeError: 'StreamingBody' does not support the buffer interface

更新 - 使用以下工作

UPDATE - Using the following worked

file = response["Body"].read()

和

pd.read_csv(io.BytesIO(file), header=14, delimiter="	", low_memory=False)

推荐答案

pandas 使用 boto for read_csv，所以你应该能够:

pandas uses boto for read_csv, so you should be able to:

import boto
data = pd.read_csv('s3://bucket....csv')

如果你需要 boto3 因为你在 python3.4+ 上，你可以

If you need boto3 because you are on python3.4+, you can

import boto3
import io
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket', Key='key')
df = pd.read_csv(io.BytesIO(obj['Body'].read()))

自 0.20.1 版以来 pandas 使用 s3fs，请参阅下面的答案.

Since version 0.20.1 pandas uses s3fs, see answer below.

这篇关于如何在不写入磁盘的情况下将 AWS S3 上的文本文件导入 Pandas的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在不写入磁盘的情况下将 AWS S3 上的文本文件导入 Pandas [英] How to import a text file on AWS S3 into pandas without writing to disk

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在不写入磁盘的情况下将 AWS S3 上的文本文件导入 Pandas [英] How to import a text file on AWS S3 into pandas without writing to disk

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭