从s3读取.pptx文件 [英] Read .pptx file from s3

查看:95
本文介绍了从s3读取.pptx文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从Amazon S3打开.pptx并使用python-pptx库读取它.这是代码:

I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code:

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))

它给出"AttributeError:'StreamingBody'对象没有属性'seek'".这不行吗?我怎样才能解决这个问题?我也尝试先在主体上使用read().有没有真正下载文件的解决方案吗?

It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'". Shouldn't this work? How can I fix this? I also tried using read() on body first. Is there a solution without actually downloading the file?

推荐答案

要从S3加载文件,您应该下载(或使用流策略)并使用 io.BytesIO 将数据转换为 pptx.Presentation 可以处理.

To load files from S3 you should download (or use stream strategy) and use io.BytesIO to transform your data as pptx.Presentation can handle.

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))

参考:

就像处理变量一样,当我们使用io模块的Byte IO操作时,数据可以作为字节保存在内存缓冲区中. 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆