如何在python中将unicode字符串(JSON中的一个)编码为'utf-8'? [英] How to encode a unicode string (ones from JSON) to 'utf-8' in python?
问题描述
我正在使用Flask-Python创建REST API.其中一个网址(/uploads)接收(一个POST HTTP请求)和一个JSON'{"src":"void","settings":"my settings"}'.我可以单独提取每个对象并编码为字节字符串,然后可以使用python中的hashlib对其进行哈希处理.但是,我的目标是获取整个字符串,然后进行编码,使其看起来像... myfile.encode('utf-8').打印myfile的显示内容如下>> {u'src':u'void',u'settings':u'my settings'},是否可以将上面未编码的字符串编码为utf-8 hashlib.sha1(mayflies.encode('uff-8')的字节.请让我知道更多的澄清信息.
I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{"src":"void", "settings":"my settings"}'. I can individually extract each object and encode to a byte string which can then be hashed using hashlib in python. However, my goal is to take the whole string and then encode so it looks like...myfile.encode('utf-8'). Printing myfile displays as follows >> {u'src':u'void', u'settings':u'my settings'}, is there anyway I can take the above unicoded string then encode to utf-8 to a sequence of bytes for hashlib.sha1(mayflies.encode('uff-8'). Do let me know for more clarification. Thanks in advance.
fileSRC = request.json['src']
fileSettings = request.json['settings']
myfile = request.json
print myfile
#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works
print guid
推荐答案
正如您在评论中所说,您使用以下方法解决了您的问题:
As you said in comments, you solved your issue using:
jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))
但是重要的是要理解为什么它起作用. Flask 向您发送unicode()
(对于非ASCII)和str()
(对于ASCII).使用JSON转储结果将为您提供一致的结果,因为它抽象化了内部Python表示形式,就像您只有unicode()
一样.
But it's important to understand why this works. Flask sends you unicode()
for non-ASCII, and str()
for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode()
.
在Python 2(您使用的Python版本)中,您不需要.encode('utf-8')
,因为json.dumps()
的ensure_ascii
的默认值为True
.当您将非ASCII数据发送到json.dumps()
时,它将使用JSON转义序列实际转储ASCII:无需编码为UTF-8.另外,由于 Zen of Python 表示显式优于隐式" ,即使ensure_ascii
已经是True
,您也可以指定它:
In Python 2 (the Python version you're using), you don't need .encode('utf-8')
because the default value of ensure_ascii
of json.dumps()
is True
. When you send non-ASCII data to json.dumps()
, it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii
is already True
, you could specify it:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)
Python 3
但是在Python 3中,这将不再起作用.实际上,即使unicode
字符串中的所有内容均为ASCII,json.dumps()
也会在Python 3中返回unicode
.但是hashlib.sha1
仅适用于bytes
.即使您只需要ASCII编码,也需要使转换变得明确:
Python 3
In Python 3 however, this would no longer work. Inded, json.dumps()
returns unicode
in Python 3, even if everything in the unicode
string is ASCII. But hashlib.sha1
only works on bytes
. You need to make the conversion explicit, even if the ASCII encoding is all you need:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))
这就是为什么Python 3是更好的语言的原因:它迫使您更清楚地显示所使用的文本,无论是str
(Unicode)还是bytes
.这样可以避免很多问题.
This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str
(Unicode) or bytes
. This avoids many, many problems down the road.
这篇关于如何在python中将unicode字符串(JSON中的一个)编码为'utf-8'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!