如何优雅地消费多个主题时在flink中管理多个avsc文件 [英] how to manage many avsc files in flink when consuming multiple topics gracefully

查看:25
本文介绍了如何优雅地消费多个主题时在flink中管理多个avsc文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的情况:我使用 flink 在 Kafka 中使用 SimpleStringSchema 消费许多主题.使用 OutputTag 是因为我们稍后需要将 Parquet + Snappy 中的数据按主题存储到目录中.然后我们遍历所有主题,同时使用 AVSC 模式文件处理每个主题.

Here is my case: I use flink to consume many topics in Kafka with SimpleStringSchema. OutputTag is used since we need to bucket the data in Parquet + Snappy into directories by topic later. Then we go through all the topics while each topic is processed with AVSC schema file.

现在我必须在添加一些新列时修改 avsc 架构文件.十个或一百个文件要修改,我就麻烦了.

Now I have to modify the avsc schema file when some new columns added. It'll make me in trouble when ten or hundred files needed to modify.

那么有没有更优雅的方法来避免更改 avsc 文件或如何更好地管理它们?

So is there a more graceful way to avoid changing the avsc file or how to manage them better?

推荐答案

一般来说,我会避免在同一来源中摄取具有不同架构的数据.对于同一主题中的多个模式尤其如此.

In general, I'd avoid ingesting data with different schemas in the same source. That is especially true for multiple schemas within the same topic.

避免这种情况的一种常见且可扩展的方法是使用某种信封格式.

A common and scalable way to avoid it is to use some kind of envelope format.

{
  "namespace": "example",
  "name": "Envelope",
  "type": "record",
  "fields": [
    {
      "name": "type1",
      "type": ["null", {
        "type": "record",
        "fields": [ ... ]
      }],
      "default": null
    },
    {
      "name": "type2",
      "type": ["null", {
        "type": "record",
        "fields": [ ... ]
      }],
      "default": null
    }
  ]
}

这个信封是可进化的(任意添加/删除包裹类型,它们本身可以进化),并且只增加了一点开销(每个子类型 1 个字节).缺点是您不能强制设置恰好是其中一种子类型.

This envelope is evolvable (arbitrary addition/removal of wrapped types, which by themselves can be evolved), and adds only a little overhead (1 byte per subtype). The downside is that you cannot enforce that exactly one of the subtypes is set.

此架构与架构注册表完全兼容,因此无需手动解析任何内容.

This schema is fully compatible with the schema registry, so no need to parse anything manually.

这篇关于如何优雅地消费多个主题时在flink中管理多个avsc文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆