使用json.dumps将utf-8文本另存为UTF8,而不是\ u转义序列 [英] Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence
问题描述
示例代码:
>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print(json_string)
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"
问题:这不是人类可读的.我的(智能)用户想要使用JSON转储来验证甚至编辑文本文件(我宁愿不使用XML).
The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).
是否有一种方法可以将对象序列化为UTF-8 JSON字符串(而不是\uXXXX
)?
Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX
)?
推荐答案
使用ensure_ascii=False
切换到json.dumps()
,然后将值手动编码为UTF-8:
Use the ensure_ascii=False
switch to json.dumps()
, then encode the value to UTF-8 manually:
>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print(json_string.decode())
"ברי צקלה"
如果要写入文件,只需使用json.dump()
并将其留给文件对象进行编码:
If you are writing to a file, just use json.dump()
and leave it to the file object to encode:
with open('filename', 'w', encoding='utf8') as json_file:
json.dump("ברי צקלה", json_file, ensure_ascii=False)
Python 2注意事项
对于Python 2,还有更多注意事项需要考虑.如果要将其写入文件,则可以使用 io.open()
而不是open()
来生成一个文件对象,该文件对象在编写时为您编码Unicode值,然后使用json.dump()
代替来写入该文件:
For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open()
instead of open()
to produce a file object that encodes Unicode values for you as you write, then use json.dump()
instead to write to that file:
with io.open('filename', 'w', encoding='utf8') as json_file:
json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
请注意,在json
模块中有一个 bug,其中ensure_ascii=False
标志可以产生str
对象的 mix .那么,Python 2的解决方法是:
Do note that there is a bug in the json
module where the ensure_ascii=False
flag can produce a mix of unicode
and str
objects. The workaround for Python 2 then is:
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(u"ברי צקלה", ensure_ascii=False)
# unicode(data) auto-decodes data to unicode if str
json_file.write(unicode(data))
在Python 2中,当使用编码为UTF-8的字节字符串(类型为str
)时,请确保还设置了encoding
关键字:
In Python 2, when using byte strings (type str
), encoded to UTF-8, make sure to also set the encoding
keyword:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}
>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה
这篇关于使用json.dumps将utf-8文本另存为UTF8,而不是\ u转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!