如何将AWS S3上的文本文件导入 pandas 而无需写入磁盘 [英] How to import a text file on AWS S3 into pandas without writing to disk

查看：206 发布时间：2018/6/7 10:24:30 python pandas heroku amazon-s3 boto3

本文介绍了如何将AWS S3上的文本文件导入 pandas 而无需写入磁盘的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在S3上保存了一个文本文件，这是一个制表符分隔的表格。我想将它加载到熊猫中，但无法首先保存它，因为我在一台Heroku服务器上运行。这是我到目前为止。

 进口io 
进口boto3 
进口os 
 import pandas as pd 
 $ b os.environ [AWS_ACCESS_KEY_ID] =xxxxxxxx
 os.environ [AWS_SECRET_ACCESS_KEY] =xxxxxxxx
 
 s3_client = boto3.client（'s3'）
 response = s3_client.get_object（Bucket =my_bucket，Key =filename.txt）
 file = response [Body] 
 
 
 pd.read_csv（file，header = 14，delimiter =\ t，low_memory = False）

错误是：

  OSError：期望的文件路径名或类似文件的对象，有< class '字节' >类型

如何将响应正文转换为pandas将接受的格式？

  pd.read_csv（io.StringIO（file），header = 14，delimiter =\ t，low_memory = False）
 
返回
 
 TypeError：initial_value必须是str或None，而不是StreamingBody 
 
 pd.read_csv（io.BytesIO（file），header = 14，delimiter =\\ \\ t，low_memory = False）
 
返回
 
 TypeError：'StreamingBody'不支持缓冲区接口

更新 - 使用以下工作

  file = response [ （）
 
 
 
 
 
 
 
 
 
 
 
 
 $ pre>  pd.read_csv（io.BytesIO（file），header = 14，delimiter =\ t，low_memory = False）

解决方案   pandas  uses  boto  for  read_csv ，所以你应该能够： 
 
 
 导入boto 
 data = pd.read_csv （'s3：/ bucket .... csv'）
  
如果您需要 boto3 因为您位于 python3.4 + ，您可以 
 
 
  import boto3 
 import io 
 s3 = boto3.client（'s3'）
 obj = s3.get_object（Bucket ='bucket'，Key ='key' ）
 df = pd.read_csv（io.BytesIO（obj ['Body']。read（）））
  
 
I have a text file saved on S3 which is a tab delimited table. I want to load it into pandas but cannot save it first because I am running on a heroku server. Here is what I have so far.
import io
import boto3
import os
import pandas as pd

os.environ["AWS_ACCESS_KEY_ID"] = "xxxxxxxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxxxxxxx"

s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket="my_bucket",Key="filename.txt")
file = response["Body"]


pd.read_csv(file, header=14, delimiter="\t", low_memory=False)
the error is
OSError: Expected file path name or file-like object, got <class 'bytes'> type
How do I convert the response body into a format pandas will accept?
pd.read_csv(io.StringIO(file), header=14, delimiter="\t", low_memory=False)

returns

TypeError: initial_value must be str or None, not StreamingBody

pd.read_csv(io.BytesIO(file), header=14, delimiter="\t", low_memory=False)

returns

TypeError: 'StreamingBody' does not support the buffer interface
UPDATE - Using the following worked
file = response["Body"].read()
and 
pd.read_csv(io.BytesIO(file), header=14, delimiter="\t", low_memory=False)

 解决方案 
pandas uses boto for read_csv, so you should be able to:
import boto
data = pd.read_csv('s3:/bucket....csv')
If you need boto3 because you are on python3.4+, you can
import boto3
import io
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket', Key='key')
df = pd.read_csv(io.BytesIO(obj['Body'].read()))


                        
这篇关于如何将AWS S3上的文本文件导入 pandas 而无需写入磁盘的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何将AWS S3上的文本文件导入 pandas 而无需写入磁盘 [英] How to import a text file on AWS S3 into pandas without writing to disk

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将AWS S3上的文本文件导入 pandas 而无需写入磁盘 [英] How to import a text file on AWS S3 into pandas without writing to disk

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭