使utf8在文件中可读 [英] Make utf8 readable in a file
问题描述
我有字典的字典有utf8编码的密钥。我使用 json
模块将此字典转储到一个文件。
在文件键中以utf8格式打印。键实际上是孟加拉语的字母。
I have dictionary of dictionary which has utf8 encoded keys. I am dumping this dictionary to a file using json
module.
In the file keys are printed in utf8 format. Keys are actually letters of Bengali language.
我想在文件中写入实际的字母。 如何做到这一点
I want actual letters to get written in the file. How to do this ??
如果我打印这些键(其中一个是u'\\\ং')到控制台实际字母ং)显示,但在我的文件 \\\ং
.is写。
If I print those keys(one of them is u'\u0982') to console actual letter(ং) is shown but in my file \u0982
.is written. What does print do to show the actual letter?
推荐答案
您正在编写JSON; JSON标准允许 \uxxxx
转义序列来编码非ASCII字符。默认情况下,Python json
模块使用此方法。
You are writing JSON; the JSON standard allows for \uxxxx
escape sequences to encode non-ASCII characters. The Python json
module uses this by default.
使用
json.dump(obj, yourfileobject, ensure_ascii=False)
这意味着输出不再被编码为UTF-8字节;您需要为此使用 codecs.open()
管理文件:
This does mean that the output is no longer encoded to UTF-8 bytes as well; you'll need to use a codecs.open()
managed file for this:
import json
import codecs
with codecs.open('/path/to/file', 'w', encoding='utf8') as output:
json.dump(obj, output, ensure_ascii=False)
现在你的unicode字符将被写入该文件以UTF-8编码的字节代替。当使用另一个再次解码UTF-8的程序打开文件时,您的代码点应该再次显示为相同的字符。
Now your unicode characters will be written to the file as UTF-8 encoded bytes instead. When opening the file with another program that decodes UTF-8 again, your codepoints should be displayed again as the same characters.
这篇关于使utf8在文件中可读的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!