带有列表注释的嵌套 python 数据类 [英] Nested python dataclasses with list annotations
问题描述
蟒蛇^3.7.尝试创建嵌套数据类以处理复杂的 json 响应.我设法通过为 json 的每个级别创建数据类并使用 __post_init_
将字段设置为其他数据类的对象来做到这一点.然而,这会创建大量样板代码,而且嵌套对象没有注释.
python ^3.7. Trying to create nested dataclasses to work with complex json response. I managed to do that with creating dataclass for every level of json and using __post_init_
to set fields as objects of other dataclasses. However that creates a lot of boilerplate code and also, there is no annotation for nested objects.
这个答案帮助我更接近使用包装器的解决方案:
This answer helped me getting closer to the solution using wrapper:
https://stackoverflow.com/a/51565863/8325015
但是对于属性是对象列表的情况,它不能解决它.some_attribute: List[SomeClass]
However it does not solve it for cases where attribute is list of objects. some_attribute: List[SomeClass]
以下是类似于我的数据的示例:
Here is example that resembles my data:
from dataclasses import dataclass, is_dataclass
from typing import List
from copy import deepcopy
# decorator from the linked thread:
def nested_deco(*args, **kwargs):
def wrapper(check_class):
# passing class to investigate
check_class = dataclass(check_class, **kwargs)
o_init = check_class.__init__
def __init__(self, *args, **kwargs):
for name, value in kwargs.items():
# getting field type
ft = check_class.__annotations__.get(name, None)
if is_dataclass(ft) and isinstance(value, dict):
obj = ft(**value)
kwargs[name] = obj
o_init(self, *args, **kwargs)
check_class.__init__ = __init__
return check_class
return wrapper(args[0]) if args else wrapper
#some dummy dataclasses to resemble my data structure
@dataclass
class IterationData:
question1: str
question2: str
@nested_deco
@dataclass
class IterationResult:
name: str
data: IterationData
@nested_deco
@dataclass
class IterationResults:
iterations: List[IterationResult]
@dataclass
class InstanceData:
date: str
owner: str
@nested_deco
@dataclass
class Instance:
data: InstanceData
name: str
@nested_deco
@dataclass
class Result:
status: str
iteration_results: IterationResults
@nested_deco
@dataclass
class MergedInstance:
instance: Instance
result: Result
#example data
single_instance = {
"instance": {
"name": "example1",
"data": {
"date": "2021-01-01",
"owner": "Maciek"
}
},
"result": {
"status": "complete",
"iteration_results": [
{
"name": "first",
"data": {
"question1": "yes",
"question2": "no"
}
}
]
}
}
instances = [deepcopy(single_instance) for i in range(3)] #created a list just to resemble mydata
objres = [MergedInstance(**inst) for inst in instances]
你会注意到.nested_deco
适用于 MergedInstance
的属性和 Instance
的属性 data
但它不加载 IterationResults
类.Result
的 iteration_results
上的
As you will notice. nested_deco
works perfectly for attributes of MergedInstance
and for attribute data
of Instance
but it does not load IterationResults
class on iteration_results
of Result
.
有没有办法实现它?
我还附上了我的 post_init 解决方案的示例,该解决方案创建了类对象,但没有属性注释:
I attach also example with my post_init solution which creates objects of classes but there is no annotation of attributes:
@dataclass
class IterationData:
question1: str
question2: str
@dataclass
class IterationResult:
name: str
data: dict
def __post_init__(self):
self.data = IterationData(**self.data)
@dataclass
class InstanceData:
date: str
owner: str
@dataclass
class Instance:
data: dict
name: str
def __post_init__(self):
self.data = InstanceData(**self.data)
@dataclass
class Result:
status: str
iteration_results: list
def __post_init__(self):
self.iteration_results = [IterationResult(**res) for res in self.iteration_results]
@dataclass
class MergedInstance:
instance: dict
result: dict
def __post_init__(self):
self.instance = Instance(**self.instance)
self.result = Result(**self.result)
推荐答案
这并没有真正回答你关于嵌套装饰器的问题,但我最初的建议是通过使用库来避免为自己做很多艰苦的工作以前解决过同样的问题.
This doesn't really answer your question about the nested decorators, but my initial suggestion would be to avoid a lot of hard work for yourself by making use of libraries that have tackled this same problem before.
有很多众所周知的,比如 pydantic,它也提供数据验证,这是我可能会推荐.如果您有兴趣保留现有的 dataclass
结构并且不想从任何东西继承,您可以使用诸如 dataclass-wizard 和 dataclasses-json.后者提供了一种您可能感兴趣的装饰器方法.但理想情况下,目标是找到一个(高效的)JSON 序列化库,它已经提供了您所需要的.
There are lot of well known ones like pydantic which also provides data validation and is something I might recommend. If you are interested in keeping your existing dataclass
structure and not wanting to inherit from anything, you can use libraries such as dataclass-wizard and dataclasses-json. The latter one offers a decorator approach which you might interest you. But ideally, the goal is to find a (efficient) JSON serialization library which already offers exactly what you need.
这是一个使用 dataclass-wizard
库的示例,只需进行最少的更改(无需从 mixin 类继承).请注意,我必须稍微修改您的输入 JSON 对象,否则它与数据类架构并不完全匹配.但除此之外,它看起来应该按预期工作.我还删除了 copy.deepcopy
,因为它有点慢而且我们不需要它(辅助函数无论如何都不会直接修改 dict
对象,这很简单,可以测试)
Here is an example using the dataclass-wizard
library with minimal changes needed (no need to inherit from a mixin class). Note that I had to modify your input JSON object slightly, as it didn't exactly match the dataclass schema otherwise. But otherwise, it looks like it should work as expected. I've also removed copy.deepcopy
, as that's a bit slower and we don't need it (the helper functions won't directly modify the dict
objects anyway, which is simple enough to test)
from dataclasses import dataclass
from typing import List
from dataclass_wizard import fromlist
@dataclass
class IterationData:
question1: str
question2: str
@dataclass
class IterationResult:
name: str
data: IterationData
@dataclass
class IterationResults:
iterations: List[IterationResult]
@dataclass
class InstanceData:
date: str
owner: str
@dataclass
class Instance:
data: InstanceData
name: str
@dataclass
class Result:
status: str
iteration_results: IterationResults
@dataclass
class MergedInstance:
instance: Instance
result: Result
single_instance = {
"instance": {
"name": "example1",
"data": {
"date": "2021-01-01",
"owner": "Maciek"
}
},
"result": {
"status": "complete",
"iteration_results": {
# Notice i've changed this here - previously syntax was invalid (this was
# a list)
"iterations": [
{
"name": "first",
"data": {
"question1": "yes",
"question2": "no"
}
}
]
}
}
}
instances = [single_instance for i in range(3)] # created a list just to resemble mydata
objres = fromlist(MergedInstance, instances)
for obj in objres:
print(obj)
使用 dataclasses-json
库:
from dataclasses import dataclass
from typing import List
from dataclasses_json import dataclass_json
# Same as above
...
@dataclass_json
@dataclass
class MergedInstance:
instance: Instance
result: Result
single_instance = {...}
instances = [single_instance for i in range(3)] # created a list just to resemble mydata
objres = [MergedInstance.from_dict(inst) for inst in instances]
for obj in objres:
print(obj)
奖励: 假设您正在调用一个 API,该 API 会返回一个复杂的 JSON 响应,例如上面的那个.如果您想将此 JSON 响应转换为数据类模式,通常您必须手动将其写出,如果 JSON 的结构特别复杂,这可能会有点令人厌烦.
Bonus: Let's say you are calling an API that returns you a complex JSON response, such as the one above. If you want to convert this JSON response to a dataclass schema, normally you'll have to write it out by hand, which can be a bit tiresome if the structure of the JSON is especially complex.
如果有一种方法可以简化嵌套数据类结构的生成,那不是很酷吗?dataclass-wizard
库附带一个接受任意 JSON 输入的 CLI 工具,因此在给定这样的输入的情况下自动生成数据类模式当然应该是可行的.
Wouldn't it be cool if there was a way to simplify the generation of a nested dataclass structure? The dataclass-wizard
library comes with a CLI tool that accepts an arbitrary JSON input, so it should certainly be doable to auto-generate a dataclass schema given such an input.
假设您在 testing.json
文件中有这些内容:
Assume you have these contents in a testing.json
file:
{
"instance": {
"name": "example1",
"data": {
"date": "2021-01-01",
"owner": "Maciek"
}
},
"result": {
"status": "complete",
"iteration_results": {
"iterations": [
{
"name": "first",
"data": {
"question1": "yes",
"question2": "no"
}
}
]
}
}
}
然后我们运行以下命令:
Then we run the following command:
wiz gs testing testing
以及我们新的 testing.py
文件的内容:
And the contents of our new testing.py
file:
from dataclasses import dataclass
from datetime import date
from typing import List, Union
from dataclass_wizard import JSONWizard
@dataclass
class Data(JSONWizard):
"""
Data dataclass
"""
instance: 'Instance'
result: 'Result'
@dataclass
class Instance:
"""
Instance dataclass
"""
name: str
data: 'Data'
@dataclass
class Data:
"""
Data dataclass
"""
date: date
owner: str
@dataclass
class Result:
"""
Result dataclass
"""
status: str
iteration_results: 'IterationResults'
@dataclass
class IterationResults:
"""
IterationResults dataclass
"""
iterations: List['Iteration']
@dataclass
class Iteration:
"""
Iteration dataclass
"""
name: str
data: 'Data'
@dataclass
class Data:
"""
Data dataclass
"""
question1: Union[bool, str]
question2: Union[bool, str]
这似乎或多或少与原始问题中相同的嵌套数据类结构相匹配,最重要的是我们不需要自己编写任何代码!
That appears to more or less match the same nested dataclass structure from the original question, and best of all we didn't need to write any of the code ourselves!
然而,有一个小问题——由于一些重复的 JSON 键,我们最终得到了三个名为 Data
的数据类.所以我继续将它们重命名为 Data1
、Data2
和 Data3
以确保唯一性.然后我们可以进行快速测试以确认我们能够将相同的 JSON 数据加载到我们的新数据类架构中:
However, there's a minor problem - because of some duplicate JSON keys, we end up with three data classes named Data
. So I've went ahead and renamed them to Data1
, Data2
, and Data3
for uniqueness. And then we can do a quick test to confirm that we're able to load the same JSON data into our new dataclass schema:
import json
from dataclasses import dataclass
from datetime import date
from typing import List, Union
from dataclass_wizard import JSONWizard
@dataclass
class Data1(JSONWizard):
"""
Data dataclass
"""
instance: 'Instance'
result: 'Result'
@dataclass
class Instance:
"""
Instance dataclass
"""
name: str
data: 'Data2'
@dataclass
class Data2:
"""
Data dataclass
"""
date: date
owner: str
@dataclass
class Result:
"""
Result dataclass
"""
status: str
iteration_results: 'IterationResults'
@dataclass
class IterationResults:
"""
IterationResults dataclass
"""
iterations: List['Iteration']
@dataclass
class Iteration:
"""
Iteration dataclass
"""
name: str
data: 'Data3'
@dataclass
class Data3:
"""
Data dataclass
"""
question1: Union[bool, str]
question2: Union[bool, str]
# ---- Start of our test
with open('testing.json') as in_file:
d = json.load(in_file)
c = Data1.from_dict(d)
print(repr(c))
# Data1(instance=Instance(name='example1', data=Data2(date=datetime.date(2021, 1, 1), owner='Maciek')), result=Result(status='complete', iteration_results=IterationResults(iterations=[Iteration(name='first', data=Data3(question1='yes', question2='no'))])))
这篇关于带有列表注释的嵌套 python 数据类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!