如何从 JSON 获取字符串对象而不是 Unicode? [英] How to get string objects instead of Unicode from JSON?

查看:25
本文介绍了如何从 JSON 获取字符串对象而不是 Unicode?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python 2ASCII 编码 文本文件解析 JSON.

I'm using Python 2 to parse JSON from ASCII encoded text files.

使用 json 加载这些文件时或 simplejson,我所有的字符串值都被转换为 Unicode 对象字符串对象.问题是,我必须将数据与一些只接受字符串对象的库一起使用.我无法更改库,也无法更新它们.

When loading these files with either json or simplejson, all my string values are cast to Unicode objects instead of string objects. The problem is, I have to use the data with some libraries that only accept string objects. I can't change the libraries nor update them.

是否可以获取字符串对象而不是 Unicode 对象?

Is it possible to get string objects instead of Unicode ones?

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

更新

这个问题是很久以前问的,当时我被 Python 2 困住了.今天一个简单而干净的解决方案是使用最新版本的 Python - 即 Python 3 及更高版本.

Update

This question was asked a long time ago, when I was stuck with Python 2. One easy and clean solution for today is to use a recent version of Python — i.e. Python 3 and forward.

推荐答案

object_hook

的解决方案

:针对 Python 2.7 3.x 兼容性进行了更新.

A solution with object_hook

[edit]: Updated for Python 2.7 and 3.x compatibility.

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    if isinstance(data, str):
        return data

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.items() # changed to .items() for python 2.7/3
        }

    # python 3 compatible duck-typing
    # if this is a unicode string, return its string representation
    if str(type(data)) == "<type 'unicode'>":
        return data.encode('utf-8')

    # if it's anything else, return it in its original form
    return data

示例用法:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

这是如何工作的,我为什么要使用它?

Mark Amery 的函数比这些更短更清晰,那么它们有什么意义呢?为什么要使用它们?

How does this work and why would I use it?

Mark Amery's function is shorter and clearer than these ones, so what's the point of them? Why would you want to use them?

纯粹是为了性能.Mark 的回答首先使用 unicode 字符串完全解码 JSON 文本,然后递归遍历整个解码值以将所有字符串转换为字节字符串.这会产生一些不良影响:

Purely for performance. Mark's answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

  • 在内存中创建了整个解码结构的副本
  • 如果您的 JSON 对象真的深度嵌套(500 级或更多),那么您将达到 Python 的最大递归深度
  • A copy of the entire decoded structure gets created in memory
  • If your JSON object is really deeply nested (500 levels or more) then you'll hit Python's maximum recursion depth

此答案通过使用 json.loadjson.loadsobject_hook 参数缓解了这两个性能问题.来自文档:

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook 是一个可选函数,将使用任何对象文字解码的结果(dict)调用.将使用 object_hook 的返回值代替 dict.此功能可用于实现自定义解码器

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

因为在其他字典中嵌套了很多层的字典被传递给 object_hook 当它们被解码时,我们可以在那个时候字节化其中的任何字符串或列表并避免后面需要深度递归.

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they're decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark 的答案不适合用作 object_hook,因为它会递归到嵌套字典中.我们使用_byteifyignore_dicts 参数来防止此答案中的递归,当object_hook 向它传递一个新的 dict 以进行字节化.ignore_dicts 标志告诉 _byteify 忽略 dict s,因为它们已经被字节化了.

Mark's answer isn't suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

最后,我们对json_load_byteifiedjson_loads_byteified 的实现调用了_byteify(使用ignore_dicts=True)对结果从 json.loadjson.loads 返回,以处理被解码的 JSON 文本在顶层没有 dict 的情况.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn't have a dict at the top level.

这篇关于如何从 JSON 获取字符串对象而不是 Unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆