如何在 Python 中使用 Pandas 从 s3 存储桶中读取 csv 文件 [英] How to read a csv file from an s3 bucket using Pandas in Python

查看:51
本文介绍了如何在 Python 中使用 Pandas 从 s3 存储桶中读取 csv 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下代码将位于 AWS S3 存储桶中的 CSV 文件作为 Pandas 数据帧读入内存:

I am trying to read a CSV file located in an AWS S3 bucket into memory as a pandas dataframe using the following code:

import pandas as pd
import boto

data = pd.read_csv('s3:/example_bucket.s3-website-ap-southeast-2.amazonaws.com/data_1.csv')

为了提供完整的访问权限,我在 S3 存储桶上设置了存储桶策略,如下所示:

In order to give complete access I have set the bucket policy on the S3 bucket as follows:

{
"Version": "2012-10-17",
"Id": "statement1",
"Statement": [
    {
        "Sid": "statement1",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": "arn:aws:s3:::example_bucket"
    }
]

}

不幸的是,我仍然在 python 中收到以下错误:

Unfortunately I still get the following error in python:

boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed

想知道是否有人可以帮助解释如何在 AWS S3 中正确设置权限或正确配置 pandas 以导入文件.谢谢!

Wondering if someone could help explain how to either correctly set the permissions in AWS S3 or configure pandas correctly to import the file. Thanks!

推荐答案

Using pandas 0.20.3

Using pandas 0.20.3

import os
import boto3
import pandas as pd
import sys

if sys.version_info[0] < 3: 
    from StringIO import StringIO # Python 2.x
else:
    from io import StringIO # Python 3.x

# get your credentials from environment variables
aws_id = os.environ['AWS_ID']
aws_secret = os.environ['AWS_SECRET']

client = boto3.client('s3', aws_access_key_id=aws_id,
        aws_secret_access_key=aws_secret)

bucket_name = 'my_bucket'

object_key = 'my_file.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')

df = pd.read_csv(StringIO(csv_string))

这篇关于如何在 Python 中使用 Pandas 从 s3 存储桶中读取 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆