如何在python中将unicode字符串(JSON中的一个)编码为'utf-8'? [英] How to encode a unicode string (ones from JSON) to 'utf-8' in python?

查看:236
本文介绍了如何在python中将unicode字符串(JSON中的一个)编码为'utf-8'?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Flask-Python创建REST API.其中一个网址(/uploads)接收(一个POST HTTP请求)和一个JSON'{"src":"void","settings":"my settings"}'.我可以单独提取每个对象并编码为字节字符串,然后可以使用python中的hashlib对其进行哈希处理.但是,我的目标是获取整个字符串,然后进行编码,使其看起来像... myfile.encode('utf-8').打印myfile的显示内容如下>> {u'src':u'void',u'settings':u'my settings'},是否可以将上面未编码的字符串编码为utf-8 hashlib.sha1(mayflies.encode('uff-8')的字节.请让我知道更多的澄清信息.

I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{"src":"void", "settings":"my settings"}'. I can individually extract each object and encode to a byte string which can then be hashed using hashlib in python. However, my goal is to take the whole string and then encode so it looks like...myfile.encode('utf-8'). Printing myfile displays as follows >> {u'src':u'void', u'settings':u'my settings'}, is there anyway I can take the above unicoded string then encode to utf-8 to a sequence of bytes for hashlib.sha1(mayflies.encode('uff-8'). Do let me know for more clarification. Thanks in advance.

fileSRC = request.json['src']
fileSettings = request.json['settings']

myfile = request.json
print myfile

#hash the filename using sha1 from hashlib library
guid_object = hashlib.sha1(fileSRC.encode('utf-8')) // this works however I want myfile to be encoded not fileSRC
guid = guid_object.hexdigest() //this works 
print guid

推荐答案

正如您在评论中所说,您使用以下方法解决了您的问题:

As you said in comments, you solved your issue using:

jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))

但是重要的是要理解为什么它起作用. Flask 向您发送unicode()(对于非ASCII)和str()(对于ASCII).使用JSON转储结果将为您提供一致的结果,因为它抽象化了内部Python表示形式,就像您只有unicode()一样.

But it's important to understand why this works. Flask sends you unicode() for non-ASCII, and str() for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode().

在Python 2(您使用的Python版本)中,您不需要.encode('utf-8'),因为json.dumps()ensure_ascii的默认值为True.当您将非ASCII数据发送到json.dumps()时,它将使用JSON转义序列实际转储ASCII:无需编码为UTF-8.另外,由于 Zen of Python 表示显式优于隐式" ,即使ensure_ascii已经是True,您也可以指定它:

In Python 2 (the Python version you're using), you don't need .encode('utf-8') because the default value of ensure_ascii of json.dumps() is True. When you send non-ASCII data to json.dumps(), it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii is already True, you could specify it:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)

Python 3

但是在Python 3中,这将不再起作用.实际上,即使unicode字符串中的所有内容均为ASCII,json.dumps()也会在Python 3中返回unicode.但是hashlib.sha1仅适用于bytes.即使您只需要ASCII编码,也需要使转换变得明确:

Python 3

In Python 3 however, this would no longer work. Inded, json.dumps() returns unicode in Python 3, even if everything in the unicode string is ASCII. But hashlib.sha1 only works on bytes. You need to make the conversion explicit, even if the ASCII encoding is all you need:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))

这就是为什么Python 3是更好的语言的原因:它迫使您更清楚地显示所使用的文本,无论是str(Unicode)还是bytes.这样可以避免很多问题.

This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str (Unicode) or bytes. This avoids many, many problems down the road.

这篇关于如何在python中将unicode字符串(JSON中的一个)编码为'utf-8'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆