从Lambda中的S3通知事件获取非ASCII文件名 [英] Get non-ASCII filename from S3 notification event in Lambda

查看：189 发布时间：2020/7/13 2:44:22 python-2.7 amazon-s3 utf-8 aws-lambda python-unicode

本文介绍了从Lambda中的S3通知事件获取非ASCII文件名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

AWS S3通知事件中的key字段表示文件名，已转义URL.

The key field in an AWS S3 notification event, which denotes the filename, is URL escaped.

当文件名包含空格或非ASCII字符时，这很明显.

This is evident when the filename contains spaces or non-ASCII characters.

例如，我已将以下文件名上传到S3:

For example, I have upload the following filename to S3:

my file řěąλλυ.txt

收到的通知为:

{ 
  "Records": [
    "s3": {
        "object": {
            "key": u"my+file+%C5%99%C4%9B%C4%85%CE%BB%CE%BB%CF%85.txt"
        }
    }
  ]
}

我尝试使用以下方法进行解码:

I've tried to decode using:

key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf-8')

但这会产生:

my file ÅÄÄÎ»Î»Ï.txt

当然，当我随后尝试使用Boto从S3获取文件时，会出现404错误.

Of course, when I then try to get the file from S3 using Boto, I get a 404 error.

tl; dr

您需要先将URL编码的Unicode字符串转换为字节数str，然后再取消对它的URL解析并将其解码为UTF-8.

tl;dr

You need to convert the URL encoded Unicode string to a bytes str before un-urlparsing it and decoding as UTF-8.

例如，对于文件名为my file řěąλλυ.txt的S3对象:

For example, for an S3 object with the filename: my file řěąλλυ.txt:

>>> utf8_urlencoded_key = event['Records'][0]['s3']['object']['key'].encode('utf-8')
# encodes the Unicode string to utf-8 encoded [byte] string. The key shouldn't contain any non-ASCII at this point, but UTF-8 will be safer.
'my+file+%C5%99%C4%9B%C4%85%CE%BB%CE%BB%CF%85.txt'

>>> key_utf8 = urllib.unquote_plus(utf8_urlencoded_key)
# the previous url-escaped UTF-8 are now converted to UTF-8 bytes
# If you passed a Unicode object to unquote_plus, you'd have got a 
# Unicode with UTF-8 encoded bytes!
'my file \xc5\x99\xc4\x9b\xc4\x85\xce\xbb\xce\xbb\xcf\x85.txt'

# Decodes key_utf-8 to a Unicode string
>>> key = key_utf8.decode('utf-8')
u'my file \u0159\u011b\u0105\u03bb\u03bb\u03c5.txt'
# Note the u prefix. The utf-8 bytes have been decoded to Unicode points.

>>> type(key)
<type 'unicode'>

>>> print(key)
my file řěąλλυ.txt

背景

AWS犯下了更改默认编码的主要罪过- https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

您应该从decode()中得到的错误是:

The error you should've got from your decode() is:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-19: ordinal not in range(128)

key的值是Unicode.在Python 2.x中，即使没有意义，您也可以解码Unicode.在要解码Unicode的Python 2.x中，Python首先尝试先将其编码为[byte] str，然后再使用给定的编码对其进行解码.在Python 2.x中，默认编码应为ASCII，当然不能包含所使用的字符.

The value of key is a Unicode. In Python 2.x you could decode a Unicode, even though it doesn't make sense. In Python 2.x to decode a Unicode, Python first tries to encode it to a [byte] str first before decoding it using the given encoding. In Python 2.x the default encoding should be ASCII, which of course can't contain the characters used.

如果您从Python获得了正确的UnicodeEncodeError，则可能找到了合适的答案.在Python 3上，您根本无法调用.decode().

Had you got the proper UnicodeEncodeError from Python, you may have found suitable answers. On Python 3, you wouldn't have been able to call .decode() at all.

这篇关于从Lambda中的S3通知事件获取非ASCII文件名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Lambda中的S3通知事件获取非ASCII文件名 [英] Get non-ASCII filename from S3 notification event in Lambda

问题描述

推荐答案

tl; dr

tl;dr

背景

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从Lambda中的S3通知事件获取非ASCII文件名 [英] Get non-ASCII filename from S3 notification event in Lambda

问题描述

推荐答案

tl; dr

tl;dr

背景

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭