如何使用 AWS Lambda Python 读取 AWS S3 存储的 word 文档(.doc 和 .docx)文件内容? [英] How to read AWS S3 stored word document (.doc and .docx) file content using AWS Lambda Python?
问题描述
我的场景,我正在尝试使用 python 实现从 Aws Lambda 读取 AWS Stored S3 word 文档(.doc 和 .docx)文件内容.下面是我使用的代码,我的问题是我可以获取文件名但无法读取内容.
def lambda_handler(event, context):file_contents = s3.Object('Bucketname', 'sample.docx').get()['Body'].read().decode("unicode-escape")返回 {'文件名':obj.key,‘内容’:file_contents}
<块引用>
响应:{errorMessage":'unicodeescape'编解码器无法解码位置 25818-25819 中的字节:截断的 \xXX 转义"、错误类型":"UnicodeDecodeError", "stackTrace": [["/var/task/lambda_function.py",76、"lambda_handler","file_contents = s3.Object('Bucketname', 'sample.docx').get()['Body'].read().decode(\"unicode-escape\")"] ] }
.docx 和 .doc 文件是二进制文件,所以简单的解码是行不通的,也许 docx2txt可能会在这里有所帮助.
My scenario, I am trying to implement read AWS Stored S3 word document (.doc and .docx) file content from Aws Lambda by using python. Below code I am using, My problem is I can able to get the file name but I can’t able to read content.
def lambda_handler(event, context):
file_contents = s3.Object(‘Bucketname’, 'sample.docx').get()['Body'].read().decode("unicode-escape")
return {
'File Name' : obj.key,
‘Content’ : file_contents
}
Response: { "errorMessage": "'unicodeescape' codec can't decode bytes in position 25818-25819: truncated \xXX escape", "errorType": "UnicodeDecodeError", "stackTrace": [ [ "/var/task/lambda_function.py", 76, "lambda_handler", "file_contents = s3.Object('Bucketname', 'sample.docx').get()['Body'].read().decode(\"unicode-escape\")" ] ] }
.docx and .doc files are binary files, so a simple decode won't work, perhaps docx2txt may help here.
这篇关于如何使用 AWS Lambda Python 读取 AWS S3 存储的 word 文档(.doc 和 .docx)文件内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!