如何从JSON获取字符串对象而不是Unicode? [英] How to get string objects instead of Unicode from JSON?

查看:180
本文介绍了如何从JSON获取字符串对象而不是Unicode?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python 2 ASCII编码文本文件中解析JSON.

I'm using Python 2 to parse JSON from ASCII encoded text files.

使用 json simplejson ,我所有的字符串值都转换为Unicode对象而不是字符串对象.问题是,我必须将数据与仅接受字符串对象的某些库一起使用.我无法更改库,也无法对其进行更新.

When loading these files with either json or simplejson, all my string values are cast to Unicode objects instead of string objects. The problem is, I have to use the data with some libraries that only accept string objects. I can't change the libraries nor update them.

是否可以获取字符串对象而不是Unicode对象?

Is it possible to get string objects instead of Unicode ones?

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

更新

很久以前,当我坚持使用 Python 2 时,这个问题就被问到了.对于当今而言,一种简单易用的解决方案是使用最新版本的Python,即 Python 3 及更高版本.

Update

This question was asked a long time ago, when I was stuck with Python 2. One easy and clean solution for today is to use a recent version of Python — i.e. Python 3 and forward.

推荐答案

使用object_hook

的解决方案

A solution with object_hook

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    # if this is a unicode string, return its string representation
    if isinstance(data, unicode):
        return data.encode('utf-8')
    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.iteritems()
        }
    # if it's anything else, return it in its original form
    return data

示例用法:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

这是如何工作的,我为什么要使用它?

Mark Amery的功能比这些功能更短更清晰,那么它们的意义何在?您为什么要使用它们?

How does this work and why would I use it?

Mark Amery's function is shorter and clearer than these ones, so what's the point of them? Why would you want to use them?

纯粹是为了获得效果. Mark的答案首先使用Unicode字符串完全解码JSON文本,然后遍历整个解码值以将所有字符串转换为字节字符串.这会带来一些不良影响:

Purely for performance. Mark's answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

  • 在内存中创建了整个解码结构的副本
  • 如果您的JSON对象是 really 深度嵌套(500个级别或更多),则您将达到Python的最大递归深度
  • A copy of the entire decoded structure gets created in memory
  • If your JSON object is really deeply nested (500 levels or more) then you'll hit Python's maximum recursion depth

此答案通过使用json.loadjson.loadsobject_hook参数来缓解这两个性能问题.来自文档:

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook是一个可选函数,它将被解码的任何对象文字(a dict)的结果调用.将使用object_hook的返回值代替dict.此功能可用于实现自定义解码器

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

由于在其他字典中嵌套了许多层次的字典在解码时传递给了object_hook ,因此我们可以在此时对其中的任何字符串或列表进行字节化,而无需进行深度递归以后.

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they're decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark的答案不适合用作object_hook,因为它递归为嵌套词典.我们在_byteify参数中使用ignore_dicts参数来防止该递归,当object_hook将其传递给新的dict进行字节化时,该参数将始终传递给 except . ignore_dicts标志告诉_byteify忽略dict,因为它们已经被字节化了.

Mark's answer isn't suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

最后,我们的json_load_byteifiedjson_loads_byteified的实现对json.loadjson.loads返回的结果调用_byteify(带有ignore_dicts=True)来处理解码的JSON文本不正确的情况在顶层有dict.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn't have a dict at the top level.

这篇关于如何从JSON获取字符串对象而不是Unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆