Python 编码和 json 转储 [英] Python encoding and json dumps

查看:35
本文介绍了Python 编码和 json 转储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果之前有人问过这个问题,我深表歉意.我仍然不清楚python3.2中的编码.

我正在阅读一个 csv(以 UTF-8 编码,无 BOM)并且我在 csv 中有法语口音.

这是打开和读取csv文件的代码:

csvfile = open(in_file, 'r', encoding='utf-8')fieldnames = ("id","locale","message")reader = csv.DictReader(csvfile,fieldnames,escapechar="\")对于阅读器中的行:如果 row['id'] == id 和 row['locale'] == locale:out = row['message'];

我将消息(输出)作为 Json 返回

jsonout = json.dumps(out, ensure_ascii=True)返回 HttpResponse(jsonout,content_type="application/json; encoding=utf-8")

但是,当我预览结果时,重音 e(French) 被替换为 u00e9 .

你能告诉我我做错了什么,我应该怎么做,以便 json 输出显示正确的 e 和重音.

谢谢

解决方案

你没有做错任何事(Python 也没有).

Python 的 json 模块只是采用安全路径并转义非 ascii 字符.这是在 json 中表示此类字符的有效方式,任何符合标准的解析器在解析字符串时都会重新生成正确的 Unicode 字符:

<预><代码>>>>导入json>>>json.dumps({'Crêpes': 5})'{"Cr\u00eapes": 5}'>>>json.loads('{"Cr\u00eapes": 5}'){'煎饼':5}

不要忘记 json 只是你的数据的表示,"ê""\u00ea" 都是字符串的有效 json 表示 <代码>ê.符合标准的 json 解析器应该正确处理两者.

虽然可以禁用此行为,请参阅json.dump 文档:

<预><代码>>>>json.dumps({'Crêpes': 5}, ensure_ascii=False)'{"Crêpes": 5}'

I apologize if this question has been asked earlier. I am still not clear about encoding in python3.2.

I am reading a csv(encoded in UTF-8 w/o BOM) and I have French accents in the csv.

Here is the code to opening and reading the csv file:

csvfile = open(in_file, 'r', encoding='utf-8')
fieldnames = ("id","locale","message")    
reader = csv.DictReader(csvfile,fieldnames,escapechar="\") 
for row in reader:
        if row['id'] == id and row['locale'] == locale:
            out = row['message'];

I am returning the message(out) as Json

jsonout = json.dumps(out, ensure_ascii=True)    
return HttpResponse(jsonout,content_type="application/json; encoding=utf-8")

However when I preview the result I get the accent e(French) being replaced by u00e9 .

Can you please advice on what I am doing wrong and what should I do so that the json output shows the proper e with accent.

Thanks

解决方案

You're doing nothing wrong (and neither is Python).

Python's json module simply takes the safe route and escapes non-ascii characters. This is a valid way of representing such characters in json, and any conforming parser will resurrect the proper Unicode characters when parsing the string:

>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\u00eapes": 5}'
>>> json.loads('{"Cr\u00eapes": 5}')
{'Crêpes': 5}

Don't forget that json is just a representation of your data, and both "ê" and "\u00ea" are valid json representations of the string ê. Conforming json parsers should handle both correctly.

It is possible to disable this behaviour though, see the json.dump documentation:

>>> json.dumps({'Crêpes': 5}, ensure_ascii=False)
'{"Crêpes": 5}'

这篇关于Python 编码和 json 转储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆