在Python中从AWS S3读取gzip文件的内容 [英] Reading contents of a gzip file from a AWS S3 in Python

查看:496
本文介绍了在Python中从AWS S3读取gzip文件的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从我在AWS中运行的Hadoop进程中读取一些日志.日志存储在S3文件夹中,并具有以下路径.

I am trying to read some logs from a Hadoop process that I run in AWS. The logs are stored in an S3 folder and have the following path.

bucketname =名称 键= y/z/stderr.gz 这里Y是群集ID,z是文件夹名称.这两个都充当AWS中的文件夹(对象).因此完整路径类似于x/y/z/stderr.gz.

bucketname = name key = y/z/stderr.gz Here Y is the cluster id and z is a folder name. Both of these act as folders(objects) in AWS. So the full path is like x/y/z/stderr.gz.

现在,我想解压缩此.gz文件并读取该文件的内容.我不想将此文件下载到系统中,而是想将内容保存在python变量中.

Now I want to unzip this .gz file and read the contents of the file. I don't want to download this file to my system wants to save contents in a python variable.

这是我到目前为止尝试过的.

This is what I have tried till now.

bucket_name = "name"
key = "y/z/stderr.gz"
obj = s3.Object(bucket_name,key)
n = obj.get()['Body'].read()

这给了我一种不可读的格式.我也尝试过

This is giving me a format which is not readable. I also tried

n = obj.get()['Body'].read().decode('utf-8')

这将导致错误 utf8'编解码器无法解码位置1的字节0x8b:无效的起始字节.

我也尝试过

gzip = StringIO(obj)
gzipfile = gzip.GzipFile(fileobj=gzip)
content = gzipfile.read()

这将返回错误 IOError:不是压缩文件

不确定如何解码此.gz文件.

Not sure how to decode this .gz file.

编辑-找到了解决方案.需要在其中传递n并使用BytesIO

Edit - Found a solution. Needed to pass n in it and use BytesIO

gzip = BytesIO(n)

推荐答案

@Amit,我试图做同样的事情来测试解码文件,并让您的代码进行一些修改即可运行.我只需要删除函数def,返回值并重命名gzip变量,因为该名称正在使用中.

@Amit, I was trying to do the same thing to test decoding a file, and got your code to run with some modifications. I just had to remove the function def, the return, and rename the gzip variable, since that name is in use.

import json
import boto3
from io import BytesIO
import gzip

try:
     s3 = boto3.resource('s3')
     key='YOUR_FILE_NAME.gz'
     obj = s3.Object('YOUR_BUCKET_NAME',key)
     n = obj.get()['Body'].read()
     gzipfile = BytesIO(n)
     gzipfile = gzip.GzipFile(fileobj=gzipfile)
     content = gzipfile.read()
     print(content)
except Exception as e:
    print(e)
    raise e

这篇关于在Python中从AWS S3读取gzip文件的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆