从s3读取.pptx文件 [英] Read .pptx file from s3
问题描述
我尝试从Amazon S3打开.pptx并使用python-pptx库读取它.这是代码:
I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code:
from pptx import Presentation
import boto3
s3 = boto3.resource('s3')
obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))
它给出"AttributeError:'StreamingBody'对象没有属性'seek'".这不行吗?我怎样才能解决这个问题?我也尝试先在主体上使用read().有没有真正下载文件的解决方案吗?
It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'". Shouldn't this work? How can I fix this? I also tried using read() on body first. Is there a solution without actually downloading the file?
推荐答案
要从S3加载文件,您应该下载(或使用流策略)并使用 io.BytesIO
将数据转换为 pptx.Presentation
可以处理.
To load files from S3 you should download (or use stream strategy) and use io.BytesIO
to transform your data as pptx.Presentation
can handle.
import io
import boto3
from pptx import Presentation
s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()
prs = Presentation(io.BytesIO(object_content))
参考:
就像处理变量一样,当我们使用io模块的Byte IO操作时,数据可以作为字节保存在内存缓冲区中.
查看全文