使用 json.dumps 将 utf-8 文本保存为 UTF8,而不是 u 转义序列 [英] Saving utf-8 texts with json.dumps as UTF8, not as u escape sequence
问题描述
示例代码:
<预><代码>>>>导入json>>>json_string = json.dumps("ברי צקלה")>>>打印(json_string)u05d1u05e8u05d9u05e6u05e7u05dcu05d4"问题:它不是人类可读的.我的(聪明的)用户想要验证甚至编辑带有 JSON 转储的文本文件(我宁愿不使用 XML).
有没有办法将对象序列化为 UTF-8 JSON 字符串(而不是 uXXXX
)?
使用 ensure_ascii=False
切换到 json.dumps()
,然后将值编码为 UTF-8 手动:
如果您要写入文件,只需使用 json.dump()
并将其留给文件对象进行编码:
with open('filename', 'w', encoding='utf8') 作为 json_file:json.dump("ברי צקלה", json_file, ensure_ascii=False)
Python 2 的注意事项
对于 Python 2,还有一些注意事项需要考虑.如果要将其写入文件,则可以使用 io.open()
而不是 open()
生成一个文件对象,在您编写时为您编码 Unicode 值,然后使用 json.dump()代码>改为写入该文件:
with io.open('filename', 'w', encoding='utf8') 作为 json_file:json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
请注意,json
模块中有一个错误,其中ensure_ascii=False
标志可以产生 unicode
和 str
对象的混合.Python 2 的解决方法是:
with io.open('filename', 'w', encoding='utf8') 作为 json_file:数据 = json.dumps(u"ברי צקלה", ensure_ascii=False)# unicode(data) 如果 str 自动将数据解码为 unicodejson_file.write(unicode(data))
在 Python 2 中,当使用字节字符串(类型 str
),编码为 UTF-8 时,请确保同时设置 encoding
关键字:
Sample code:
>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print(json_string)
"u05d1u05e8u05d9 u05e6u05e7u05dcu05d4"
The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).
Is there a way to serialize objects into UTF-8 JSON strings (instead of uXXXX
)?
Use the ensure_ascii=False
switch to json.dumps()
, then encode the value to UTF-8 manually:
>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
b'"xd7x91xd7xa8xd7x99 xd7xa6xd7xa7xd7x9cxd7x94"'
>>> print(json_string.decode())
"ברי צקלה"
If you are writing to a file, just use json.dump()
and leave it to the file object to encode:
with open('filename', 'w', encoding='utf8') as json_file:
json.dump("ברי צקלה", json_file, ensure_ascii=False)
Caveats for Python 2
For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open()
instead of open()
to produce a file object that encodes Unicode values for you as you write, then use json.dump()
instead to write to that file:
with io.open('filename', 'w', encoding='utf8') as json_file:
json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
Do note that there is a bug in the json
module where the ensure_ascii=False
flag can produce a mix of unicode
and str
objects. The workaround for Python 2 then is:
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(u"ברי צקלה", ensure_ascii=False)
# unicode(data) auto-decodes data to unicode if str
json_file.write(unicode(data))
In Python 2, when using byte strings (type str
), encoded to UTF-8, make sure to also set the encoding
keyword:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: 'xd7x91xd7xa8xd7x99 xd7xa6xd7xa7xd7x9cxd7x94', 2: u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'}
>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "u05d1u05e8u05d9 u05e6u05e7u05dcu05d4", "2": "u05d1u05e8u05d9 u05e6u05e7u05dcu05d4"}'
>>> json.loads(s)['1']
u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'
>>> json.loads(s)['2']
u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה
这篇关于使用 json.dumps 将 utf-8 文本保存为 UTF8,而不是 u 转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!