递归数据类型像树一样作为 Avro 模式 [英] Recursive data type like a tree as Avro schema
问题描述
阅读https://avro.apache.org/docs/current/spec.html 它说模式必须是以下之一:
Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:
- 一个 JSON 字符串,命名一个定义的类型.
- 一个 JSON 对象,格式为:
{"type": "typeName" ...attributes...}
其中typeName
是一个原始类型或派生类型名称,定义如下.属性不本文档中定义的元数据是允许的,但不得影响序列化数据的格式. - 一个 JSON 数组,代表一个嵌入类型的联合.
- A JSON string, naming a defined type.
- A JSON object, of the form:
{"type": "typeName" ...attributes...}
wheretypeName
is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data. - A JSON array, representing a union of embedded types.
我想要一个描述树的模式,使用树的递归定义:
I want a schema that describes a tree, using the recursive definition that a tree is either:
- 具有值(例如整数)和树列表(子节点)的节点
- 有价值的叶子
我最初的尝试是这样的:
My initial attempt looked like:
{
"name": "Tree",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}
但是 Avro 编译器拒绝了这一点,抱怨没有 {"name":"Tree","type":[{"name":"Node"...
类型的东西.似乎 Avro 不喜欢顶层的联合类型.我猜这属于上述规则架构必须是 .. JSON 对象之一.. 其中 typeName 是原始类型名称或派生类型名称."我不确定派生类型名称"是什么.起初我认为它与复杂类型"相同,但包括联合类型..
But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node"...
. It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..
无论如何,将其更改为更复杂的定义:
Anyways, changing it to the more convoluted definition:
{
"name": "Tree",
"type": "record",
"fields": [{
"name": "ctors",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}]
}
有效,但现在我有一个只有一个字段的奇怪记录,其唯一目的是让我定义我想要的顶级联合类型.
works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.
这是在 Avro 中获得我想要的东西的唯一方法还是有更好的方法?
Is this the only way to get what I want in Avro or is there a better way?
谢谢!
推荐答案
如果将 Tree
表示为节点,将 Leaf
表示为带有空列表的节点对于孩子,您可以完全避免命名联合问题,并且使用一种递归类型非常简单地做到这一点:
If you represent a Tree
as a node, and a Leaf
as a node with an empty list of children, you can avoid the named union problem completely, and do this quite simply with one recursive type:
{
"type": "record",
"name": "TreeNode",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "TreeNode" }
}
]
}
现在,你的三种Tree
、Node
和Leaf
统一为一种TreeNode
,并且不需要Node
和Leaf
的联合.
Now, your three types Tree
, Node
, and Leaf
are unified into one type TreeNode
, and there is no union of Node
and Leaf
necessary.
这篇关于递归数据类型像树一样作为 Avro 模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!