使用Boto和Pandas从AWS S3读取CSV文件 [英] Read a csv file from aws s3 using boto and pandas

查看:282
本文介绍了使用Boto和Pandas从AWS S3读取CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了这里,但这些都无济于事.

I have already read through the answers available here and here and these do not help.

我正在尝试从S3存储桶中读取csv对象,并且已经能够使用以下代码成功读取数据.

I am trying to read a csv object from S3 bucket and have been able to successfully read the data using the following code.

srcFileName="gossips.csv"
def on_session_started():
  print("Starting new session.")
  conn = S3Connection()
  my_bucket = conn.get_bucket("randomdatagossip", validate=False)
  print("Bucket Identified")
  print(my_bucket)
  key = Key(my_bucket,srcFileName)
  key.open()
  print(key.read())
  conn.close()

on_session_started()

但是,如果我尝试使用熊猫作为数据帧读取同一对象,则会收到错误消息.最常见的是S3ResponseError: 403 Forbidden

However, if I try to read the same object using pandas as a data frame, I get an error. The most common one being S3ResponseError: 403 Forbidden

def on_session_started2():
  print("Starting Second new session.")
  conn = S3Connection()
  my_bucket = conn.get_bucket("randomdatagossip", validate=False)
  #     url = "https://s3.amazonaws.com/randomdatagossip/gossips.csv"
  #     urllib2.urlopen(url)

  for line in smart_open.smart_open('s3://my_bucket/gossips.csv'):
     print line
  #     data = pd.read_csv(url)
  #     print(data)

on_session_started2()

我做错了什么?我使用的是python 2.7,不能使用Python 3.

What am I doing wrong? I am on python 2.7 and cannot use Python 3.

推荐答案

这是我从S3上的csv成功读取df的工作.

Here is what I have done to successfully read the df from a csv on S3.

import pandas as pd
import boto3

bucket = "yourbucket"
file_name = "your_file.csv"

s3 = boto3.client('s3') 
# 's3' is a key word. create connection to S3 using default config and all buckets within S3

obj = s3.get_object(Bucket= bucket, Key= file_name) 
# get object and file (key) from bucket

initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word

这篇关于使用Boto和Pandas从AWS S3读取CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆