从 S3 读取 pdf 对象 [英] Read pdf object from S3

查看：81 发布时间：2021/6/28 19:07:53 pdf amazon-s3 python-3.7

本文介绍了从 S3 读取 pdf 对象的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个 lambda 函数，该函数将访问上传到 s3 的 pdf 表单并删除输入到表单中的数据并将其发送到其他地方.

I am trying to create a lambda function that will access a pdf form uploaded to s3 and strip out the data entered into the form and send it elsewhere.

当我可以在本地下载文件时，我就可以做到这一点.所以下面的脚本可以工作，并允许我将 pdf 中的数据读取到我的 Pandas 数据框中.:

I am able to do this when I can download the file locally. So the below script works and allows me to read the data from the pdf into my pandas dataframe.:

import PyPDF2 as pypdf
import pandas as pd

s3 = boto3.resource('s3')
s3.meta.client.download_file(bucket_name, asset_key, './target.pdf')

pdfobject = open("./target.pdf", 'rb')
pdf = pypdf.PdfFileReader(pdfobject)
data = pdf.getFormTextFields()

pdf_df = pd.DataFrame(data, columns=get_cols(data), index=[0])

但是使用 lambda 我无法在本地保存文件，因为我得到了一个只读文件系统"；错误.

But with lambda I cannot save the file locally because I get a "read only filesystem" error.

我尝试过使用 s3.get_object() 方法，如下所示:

I have tried using the s3.get_object() method like below:

s3_response_object= s3.get_object(
    Bucket='pdf-forms-bucket',
    Key='target.pdf',
)

pdf_bytes = s3_response_object['Body'].read()

但我不知道如何将结果字节转换为可以用 PyDF2 解析的对象.我需要和 PyDF2 将产生的输出如下所示:

But I have no idea how to convert the resulting bytes into an object that can be parsed with PyDF2. The output that I need and that PyDF2 will produce is like below:

{'form1[0].#subform[0].nameandmail[0]': 'Burt Lancaster',
 'form1[0].#subform[0].mailaddress[0]': '675 Creighton Ave, Washington DC',
 'form1[0].#subform[0].Principal[0]': 'David St. Hubbins',
 'Principal[1]': None,
 'form1[0].#subform[0].Principal[2]': 'Bart Simpson',
 'Principal[3]': None}

总而言之，我需要能够将带有可填写表单的 pdf 读取到内存中并在不下载文件的情况下对其进行解析，因为我的 lambda 函数环境不允许本地临时文件.

So in summary, I need o be able to read a pdf with fillable forms, into memory and parse it without downloading the file because my lambda function environment won't allow local temp files.

从 S3 读取 pdf 对象 [英] Read pdf object from S3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 S3 读取 pdf 对象 [英] Read pdf object from S3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭