递归数据类型像树一样作为 Avro 模式 [英] Recursive data type like a tree as Avro schema

查看:129
本文介绍了递归数据类型像树一样作为 Avro 模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读https://avro.apache.org/docs/current/spec.html 它说模式必须是以下之一:

Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:

  • 一个 JSON 字符串,命名一个定义的类型.
  • 一个 JSON 对象,格式为:{"type": "typeName" ...attributes...} 其中 typeName 是一个原始类型或派生类型名称,定义如下.属性不本文档中定义的元数据是允许的,但不得影响序列化数据的格式.
  • 一个 JSON 数组,代表一个嵌入类型的联合.
  • A JSON string, naming a defined type.
  • A JSON object, of the form: {"type": "typeName" ...attributes...} where typeName is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.
  • A JSON array, representing a union of embedded types.

我想要一个描述树的模式,使用树的递归定义:

I want a schema that describes a tree, using the recursive definition that a tree is either:

  • 具有值(例如整数)和树列表​​(子节点)的节点
  • 有价值的叶子

我最初的尝试是这样的:

My initial attempt looked like:

{
  "name": "Tree",
  "type": [
    {
      "name": "Node",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        },
        {
          "name": "children",
          "type": { "type": "array", "items": "Tree" }
        }
      ]
    },
    {
      "name": "Leaf",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        }
      ]
    }
  ]
}

但是 Avro 编译器拒绝了这一点,抱怨没有 {"name":"Tree","type":[{"name":"Node"... 类型的东西.似乎 Avro 不喜欢顶层的联合类型.我猜这属于上述规则架构必须是 .. JSON 对象之一.. 其中 typeName 是原始类型名称或派生类型名称."我不确定派生类型名称"是什么.起初我认为它与复杂类型"相同,但包括联合类型..

But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node".... It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..

无论如何,将其更改为更复杂的定义:

Anyways, changing it to the more convoluted definition:

{
  "name": "Tree",
  "type": "record",
  "fields": [{
    "name": "ctors",
    "type": [
      {
        "name": "Node",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          },
          {
            "name": "children",
            "type": { "type": "array", "items": "Tree" }
          }
        ]
      },
      {
        "name": "Leaf",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          }
        ]
      }
    ]
  }]
}

有效,但现在我有一个只有一个字段的奇怪记录,其唯一目的是让我定义我想要的顶级联合类型.

works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.

这是在 Avro 中获得我想要的东西的唯一方法还是有更好的方法?

Is this the only way to get what I want in Avro or is there a better way?

谢谢!

推荐答案

如果将 Tree 表示为节点,将 Leaf 表示为带有空列表的节点对于孩子,您可以完全避免命名联合问题,并且使用一种递归类型非常简单地做到这一点:

If you represent a Tree as a node, and a Leaf as a node with an empty list of children, you can avoid the named union problem completely, and do this quite simply with one recursive type:

{
  "type": "record",
  "name": "TreeNode",
  "fields": [
    {
      "name": "value",
      "type": "long"
    },
    {
      "name": "children",
      "type": { "type": "array", "items": "TreeNode" }
    }
  ]
}

现在,你的三种TreeNodeLeaf统一为一种TreeNode,并且不需要NodeLeaf的联合.

Now, your three types Tree, Node, and Leaf are unified into one type TreeNode, and there is no union of Node and Leaf necessary.

这篇关于递归数据类型像树一样作为 Avro 模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆