如何 JSON 序列化集合? [英] How to JSON serialize sets?

查看:40
本文介绍了如何 JSON 序列化集合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Python set,其中包含具有 __hash____eq__ 方法的对象,以确保集合中不包含重复项.

我需要对这个结果 set 进行 json 编码,但是即使将一个空的 set 传递给 json.dumps 方法也会引发一个 类型错误.

 文件/usr/lib/python2.7/json/encoder.py",第 201 行,编码块 = self.iterencode(o, _one_shot=True)文件/usr/lib/python2.7/json/encoder.py",第 264 行,在 iterencode 中返回 _iterencode(o, 0)文件/usr/lib/python2.7/json/encoder.py",第178行,默认引发类型错误(repr(o)+不是JSON可序列化的")类型错误:set([]) 不是 JSON 可序列化的

我知道我可以为具有自定义 default 方法的 json.JSONEncoder 类创建扩展,但我什至不确定从哪里开始转换set.我应该使用默认方法中的 set 值创建一个字典,然后返回编码吗?理想情况下,我想让默认方法能够处理原始编码器阻塞的所有数据类型(我使用 Mongo 作为数据源,因此日期似乎也会引发此错误)

任何正确方向的提示将不胜感激.

感谢您的回答!也许我应该更准确.

我利用(并投票)这里的答案来解决正在翻译的 set 的限制,但也有内部密钥存在问题.

set 中的对象是转换为 __dict__ 的复杂对象,但它们本身也可以包含其属性的值,这些值可能不符合json 编码器.

有很多不同的类型进入这个set,散列基本上为实体计算一个唯一的id,但本着NoSQL的真正精神,并没有确切地说明子对象包含什么.

一个对象可能包含 starts 的日期值,而另一个可能有一些其他架构,其中不包含包含非原始"对象的键.

这就是为什么我能想到的唯一解决方案是扩展 JSONEncoder 以替换 default 方法以打开不同的情况 - 但我不确定如何去解决这个问题,文档是模棱两可的.在嵌套对象中,从 default 返回的值是按键进行的,还是只是查看整个对象的通用包含/丢弃?该方法如何适应嵌套值?我已经浏览了以前的问题,但似乎无法找到针对特定情况进行编码的最佳方法(不幸的是,这似乎是我在这里需要做的).

解决方案

JSON 符号只有少数原生数据类型(对象、数组、字符串、数字、布尔值和 null),因此任何在 JSON 中序列化的内容都需要表示为这些类型之一.

json模块文档所示,这种转换可以自动完成由 JSONEncoderJSONDecoder 组成,但是你会放弃一些你可能需要的其他结构(如果你将集合转换为列表,那么你将失去恢复常规的能力列表;如果您使用 dict.fromkeys(s) 将集合转换为字典,那么您将失去恢复字典的能力.

一个更复杂的解决方案是构建一个可以与其他原生 JSON 类型共存的自定义类型.这使您可以存储嵌套结构,包括列表、集合、字典、小数、日期时间对象等:

from json import dumps, loads, JSONEncoder, JSONDecoder进口泡菜类 PythonObjectEncoder(JSONEncoder):定义默认(自我,对象):尝试:返回 {'_python_object': pickle.dumps(obj).decode('latin-1')}除了pickle.PickleError:返回超级().默认(对象)def as_python_object(dct):如果 dct 中的_python_object":return pickle.loads(dct['_python_object'].encode('latin-1'))返回 DCT

这是一个示例会话,显示它可以处理列表、字典和集合:

<预><代码>>>>data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]>>>j = 转储(数据,cls=PythonObjectEncoder)>>>负载(j,object_hook=as_python_object)[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]

或者,使用更通用的序列化技术可能会很有用,例如 YAMLTwisted Jelly,或 Python 的 pickle 模块.这些都支持更广泛的数据类型.

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I'm not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I'd like to make the default method able to handle all the datatypes that the original encoder chokes on (I'm using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There's a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there's no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing "non-primitive" objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases - but I'm not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I've looked through previous questions and can't seem to find the best approach to case-specific encoding (which unfortunately seems like what I'm going to need to do here).

解决方案

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        try:
            return {'_python_object': pickle.dumps(obj).decode('latin-1')}
        except pickle.PickleError:
            return super().default(obj)

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(dct['_python_object'].encode('latin-1'))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python's pickle module. These each support a much greater range of datatypes.

这篇关于如何 JSON 序列化集合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆