将json.dumps中的utf-8文本保存为UTF8，而不是\u转义序列 [英] Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

查看：5054 发布时间：2017/8/28 21:38:23 python json unicode utf-8 escaping

本文介绍了将json.dumps中的utf-8文本保存为UTF8，而不是\u转义序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

示例代码：

>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

问题：它不是人类可读的。我的（智能）用户想通过JSON转储验证甚至编辑文本文件。（我宁愿不使用XML）

The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps. (and i'd rather not use XML)

有没有办法将对象序列化为utf-8 json字符串（而不是\uXXXX）？

Is there a way to serialize objects into utf-8 json string (instead of \uXXXX ) ?

这没有帮助：

>>> output = json_string.decode('string-escape')
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

这个的作品，但是如果任何子对象是一个python-unicode而不是utf-8，它会转储垃圾：

this works, but if any sub-objects is a python-unicode and not utf-8, it'll dump garbage:

>>> #### ok:
>>> s= json.dumps( "ברי צקלה", ensure_ascii=False)    
>>> print json.loads(s)   
ברי צקלה

>>> #### NOT ok:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> print d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 
 2: u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'}
>>> s = json.dumps( d, ensure_ascii=False, encoding='utf8')
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
××¨× ×¦×§××

我搜索了json.dumps文档，但找不到有用的东西。

i searched the json.dumps documentation but couldn't find something useful.

我会尝试总结Martijn Pieters的评论和答案：

i'll try to sum up the comments and answers by Martijn Pieters:

（编辑：@ Sebastian的评论之后的第二个想法，大约一年稍后）

(edit: 2nd thought after @Sebastian's comment and about a year later)

有~~可能不是~~ 是内置在json.dumps中的解决方案。

there ~~might be no~~ is a built-in solution in json.dumps.

在JSON对象之前，我必须将所有字符串转换为 ~~UTF8~~ Unicode对象。

i'll have to convert all strings to ~~UTF8~~ Unicode the object before it's being JSON-ed. i'll use Mark's function that converts strings recuresively in a nested object

~~我给出的示例取决于我的电脑和IDE环境，并且在所有计算机上都不能运行。~~

谢谢大家:)

推荐答案

使用 ensure_ascii = False 切换到 json.dumps（），然后手动将值编码为UTF-8：

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

>>> json_string = json.dumps(u"ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print json_string
"ברי צקלה"

如果你正在写一个文件，你可以使用 io.open（） 而不是 open（）来编写一个文件对象，为您编写Unicode值，然后使用 json.dump（）而不是写入该文件：

If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

with io.open('filename', 'w', encoding='utf8') as json_file:
    json.dump(u"ברי צקלה", json_file, ensure_ascii=False)

在Python 3中，内置的 open（）是 io.open（）。请注意， json 模块中的错误，其中 ensure_ascii = False 标志可以产生 unicode 和 str 对象。 Python 2的解决方法是：


In Python 3, the built-in open() is an alias for io.open(). Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:
with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(u"ברי צקלה", ensure_ascii=False)
    # unicode(data) auto-decodes data to unicode if str
    json_file.write(unicode(data))

如果您传递字节字符串（键入 str 在Python 2， bytes 在Python 3）编码为UTF-8，确保也设置编码关键字：
If you are passing in byte strings (type str in Python 2, bytes in Python 3) encoded to UTF-8, make sure to also set the encoding keyword:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}

>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה

请注意， strong>您的第二个示例是不有效的Unicode;你把UTF-8字节作为unicode字面值，这将永远不会工作：
Note that your second sample is not valid Unicode; you gave it UTF-8 bytes as a unicode literal, that would never work:
>>> s = u'\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94'
>>> print s
××¨× ×¦×§××
>>> print s.encode('latin1').decode('utf8')
ברי צקלה

只有当我将该字符串编码为拉丁文1（其Unicode码编码点映射为1比特到字节），然后解码为UTF-8，您会看到预期的输出。这与JSON无关，所有这些都与您使用错误的输入有关。结果称为 Mojibake 。
Only when I encoded that string to Latin 1 (whose unicode codepoints map one-to-one to bytes) then decode as UTF-8 do you see the expected output. That has nothing to do with JSON and everything to do with that you use the wrong input. The result is called a Mojibake.
如果你有Unicode值来自字符串字面值，它使用错误的编解码器进行了解码。这可能是您的终端配置错误，或者您的文本编辑器使用与您使用Python读取文件的方式不同的编解码器来保存源代码。或者您从应用错误编解码器的库中采购它。 这与JSON库无关。
If you got that Unicode value from a string literal, it was decoded using the wrong codec. It could be your terminal is mis-configured, or that your text editor saved your source code using a different codec than what you told Python to read the file with. Or you sourced it from a library that applied the wrong codec. This all has nothing to do with the JSON library.

                        这篇关于将json.dumps中的utf-8文本保存为UTF8，而不是\u转义序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将json.dumps中的utf-8文本保存为UTF8，而不是\u转义序列 [英] Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将json.dumps中的utf-8文本保存为UTF8，而不是\u转义序列 [英] Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭