Python 编码和 json 转储 [英] Python encoding and json dumps
问题描述
如果之前有人问过这个问题,我深表歉意.我仍然不清楚python3.2中的编码.
我正在阅读一个 csv(以 UTF-8 编码,无 BOM)并且我在 csv 中有法语口音.
这是打开和读取csv文件的代码:
csvfile = open(in_file, 'r', encoding='utf-8')fieldnames = ("id","locale","message")reader = csv.DictReader(csvfile,fieldnames,escapechar="\")对于阅读器中的行:如果 row['id'] == id 和 row['locale'] == locale:out = row['message'];
我将消息(输出)作为 Json 返回
jsonout = json.dumps(out, ensure_ascii=True)返回 HttpResponse(jsonout,content_type="application/json; encoding=utf-8")
但是,当我预览结果时,重音 e(French) 被替换为 u00e9 .
你能告诉我我做错了什么,我应该怎么做,以便 json 输出显示正确的 e 和重音.
谢谢
你没有做错任何事(Python 也没有).
Python 的 json 模块只是采用安全路径并转义非 ascii 字符.这是在 json 中表示此类字符的有效方式,任何符合标准的解析器在解析字符串时都会重新生成正确的 Unicode 字符:
<预><代码>>>>导入json>>>json.dumps({'Crêpes': 5})'{"Cr\u00eapes": 5}'>>>json.loads('{"Cr\u00eapes": 5}'){'煎饼':5}不要忘记 json 只是你的数据的表示,"ê"
和 "\u00ea"
都是字符串的有效 json 表示 <代码>ê.符合标准的 json 解析器应该正确处理两者.
虽然可以禁用此行为,请参阅json.dump
文档:
I apologize if this question has been asked earlier. I am still not clear about encoding in python3.2.
I am reading a csv(encoded in UTF-8 w/o BOM) and I have French accents in the csv.
Here is the code to opening and reading the csv file:
csvfile = open(in_file, 'r', encoding='utf-8')
fieldnames = ("id","locale","message")
reader = csv.DictReader(csvfile,fieldnames,escapechar="\")
for row in reader:
if row['id'] == id and row['locale'] == locale:
out = row['message'];
I am returning the message(out) as Json
jsonout = json.dumps(out, ensure_ascii=True)
return HttpResponse(jsonout,content_type="application/json; encoding=utf-8")
However when I preview the result I get the accent e(French) being replaced by u00e9 .
Can you please advice on what I am doing wrong and what should I do so that the json output shows the proper e with accent.
Thanks
You're doing nothing wrong (and neither is Python).
Python's json module simply takes the safe route and escapes non-ascii characters. This is a valid way of representing such characters in json, and any conforming parser will resurrect the proper Unicode characters when parsing the string:
>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\u00eapes": 5}'
>>> json.loads('{"Cr\u00eapes": 5}')
{'Crêpes': 5}
Don't forget that json is just a representation of your data, and both "ê"
and "\u00ea"
are valid json representations of the string ê
. Conforming json parsers should handle both correctly.
It is possible to disable this behaviour though, see the json.dump
documentation:
>>> json.dumps({'Crêpes': 5}, ensure_ascii=False)
'{"Crêpes": 5}'
这篇关于Python 编码和 json 转储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!