如何将字典转换为Unicode JSON字符串? [英] How can I convert a dict to a unicode JSON string?

查看:162
本文介绍了如何将字典转换为Unicode JSON字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用标准库json模块对我来说这似乎是不可能的.使用json.dumps时,它将自动转义所有非ASCII字符,然后将字符串编码为ASCII.我可以指定它不转义非ASCII字符,但是当它尝试将输出转换为ASCII时会崩溃.

This doesn't appear to be possible to me using the standard library json module. When using json.dumps it will automatically escape all non-ASCII characters then encode the string to ASCII. I can specify that it not escape non-ASCII characters, but then it crashes when it tries to convert the output to ASCII.

问题是-我不需要ASCII!,我只想将我的JSON字符串作为 unicode (或 UTF-8 ) 细绳.有什么方便的方法吗?

The problem is - I don't want ASCII! I just want my JSON string back as a unicode (or UTF-8) string. Are there any convenient ways to do that?

下面是一个示例,用以说明我想要的内容:

Here's an example to demonstrate what I want:

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d, output_encoding='utf8')
# => '{"stilling": "Lærling", "navn": "Åge"}'

但是,当然没有 output_encoding 这样的选项,所以这是实际的输出:

But of course, there is no such option as output_encoding, so here's the actual output:

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d)
# => '{"stilling": "L\\u00e6rling", "navn": "\\u00c5ge"}'

总结一下-我想将Python字典转换为 UTF-8 JSON字符串 ,而不会进行任何转义.我该怎么办?

So to summarize - I want to convert a Python dict to an UTF-8 JSON string without any escapes. How can I do that?

我将接受以下解决方案:

I'll accept solutions like:

  • Hacks(对dumps的预处理和后处理输入,以实现所需的效果)
  • JSONEncoder 的子类化(我不知道它是如何工作的,该文档不是很有帮助)
  • PyPi上的第三方库
  • Hacks (pre- and post processing input to dumps to achieve the desired effect)
  • Subclassing the JSONEncoder (I have no idea how it works and the documentation isn't very helpful)
  • Third party libraries available on PyPi

推荐答案

要求

  • 确保您的python文件使用UTF-8编码.否则,您的非ASCII字符将成为问号?. Notepad ++为此提供了出色的编码选项.

    Requirements

    • Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks, ?. Notepad++ has excellent encoding options for this.

      确保已包含适当的字体.如果要显示日语字符,则需要安装日语字体.

      Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.

      确保您的IDE支持显示unicode字符. 否则,您可能会抛出UnicodeEncodeError错误.

      Make sure that your IDE supports displaying unicode characters. Otherwise you might get an UnicodeEncodeError error thrown.

      示例:

      UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to <undefined>
      

      PyScripter为我工作.它包含在 http://portablepython.com/wiki/PortablePython3.2.1.1的"Portable Python"中

      PyScripter works for me. It's included with "Portable Python" at http://portablepython.com/wiki/PortablePython3.2.1.1

      • 请确保您使用的是Python 3+,因为此版本提供了更好的unicode支持.

      json.dumps()转义unicode字符.

      json.dumps() escapes unicode characters.

      阅读底部的更新.或者...

      Read the update at the bottom. Or...

      将每个转义的字符替换为解析的unicode字符.

      Replace each escaped characters with the parsed unicode character.

      我创建了一个简单的名为getStringWithDecodedUnicode的lambda函数.

      I created a simple lambda function called getStringWithDecodedUnicode that does just that.

      import re   
      getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
      

      这是getStringWithDecodedUnicode作为常规函数.

      def getStringWithDecodedUnicode( value ):
          findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
          def getParsedUnicode(x):
              return chr( int( x.group(1), 16 ) )
      
          return  findUnicodeRE.sub(getParsedUnicode, str( value ) )
      

      示例

      testJSONWithUnicode.py(使用PyScripter作为IDE)

      import re
      import json
      getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
      
      data = {"Japan":"日本"}
      jsonString = json.dumps( data )
      print( "json.dumps({0}) = {1}".format( data, jsonString ) )
      jsonString = getStringWithDecodedUnicode( jsonString )
      print( "Decoded Unicode: %s" % jsonString )
      

      输出

      json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
      Decoded Unicode: {"Japan": "日本"}
      

      更新

      或者...只需将ensure_ascii=False作为json.dumps的选项传递即可.

      Update

      Or... just pass ensure_ascii=False as an option for json.dumps.

      注意:您需要满足我一开始所概述的要求,否则将无法正常工作.

      Note: You need to meet the requirements that I outlined at the beginning or else this isn't going to work.

      import json
      data = {'navn': 'Åge', 'stilling': 'Lærling'}
      result = json.dumps(d, ensure_ascii=False)
      print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'
      

      这篇关于如何将字典转换为Unicode JSON字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆