如何使用kafka模式管理和Avro进行重大更改 [英] How to use kafka schema management and Avro for breaking changes

查看:196
本文介绍了如何使用kafka模式管理和Avro进行重大更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有avro的kafka模式管理为我们提供了向后兼容的灵活性,但是我们如何处理方案中的突破性变化?

kafka schema management with avro give us flexibility to backward compatibility but how do we handle breaking-changes in the scheme?

假设生产者A将消息M发布到消费者C

Assume Producer A publish messages M to Consumer C

假设消息M的方案发生重大变化(例如,名称字段现在分为first_name和last_name),并且我们有了新的方案M-New

assume message M has a breaking change in it's scheme (e.g name field is now splitted into first_name and last_name) and we have new scheme M-New

现在,我们正在部署生产者A-New和消费者C-New

Now we are deploying producer A-New and Consumer C-New

问题是,在我们的部署过程完成之前,我们可以让生产者A-新发布消息M-new,而消费者C(旧的)将收到M-new,因此我们可能会丢失消息.

problem is that until our deployment process finish we can have Producer A-new publish message M-new where Consumer C (the old one) will receive the M-new and we can lose message because of that.

因此,唯一的方法是同步新生产者和消费者的部署,这会增加大量开销

So the only way to do this is to sync the deployment of new producers and consumers which is adding lots of overhead

关于如何处理的任何建议?

any suggestions how to handle that?

推荐答案

例如名称"字段现在分为"first_name"和"last_name"

e.g name field is now splitted into first_name and last_name

向后兼容"模式的Avro定义不允许您添加这些新字段,而无需以下步骤:1)保留旧名称字段2)将默认值添加到新字段-

The Avro definition of a "backwards compatible" schema could not allow you to add these new fields without 1) keeping the old name field 2) adding defaults to the new fields - https://docs.confluent.io/current/schema-registry/avro.html

如果您的使用者首先升级其架构,他们会看到旧名称字段,并继续由旧的生产者发送,并解释新字段的默认值,直到生产者升级并开始发送新字段

If your Consumers upgrade their schema first, they see the old name field, continuing to be sent by old producers as well as interpreting the defaults for the new fields until the producers upgrade and start sending the new fields

如果生产者首先升级,则消费者将永远不会看到新字段,因此生产者仍应发送名称字段,或选择发送一些垃圾值,这将有意破坏消费者(例如,使该字段可为空开头,但从不实际发送空值,然后开始发送空值,而消费者认为该值不能为空)

If the producers upgrade first, then consumers will never see the new fields, so the producers should still send out the name field, or opt to send some garbage value that'll start intentionally breaking consumers (e.g. make the field nullable to begin with but never actually send a null, then start sending a null, while consumers assume it cannot be null)

无论哪种情况,我都觉得您的记录处理逻辑必须检测哪些字段可用,而不是null或它们的默认值.

In either case, I feel like your record processing logic has to detect which fields are available and not null or their default values.

但是,将其与JSON或任何纯字符串(例如CSV)进行比较,您无法保证应该存在哪些字段,它们是否可以为空或它们是什么类型(是日期是字符串还是长字符串?),因此您不能保证客户会在内部将消息映射到哪些对象进行处理……与兼容性规则相比,我发现Avro的更大优势

But, compare that to JSON or any plain string (like CSV), and you have no guarantees of what fields should be there, if they're nullable, or what types they are (is a date a string or a long?), thus you can't guarantee what objects your clients will internally map messages into for processing... That's a larger advantage of Avro I find than compatibility rules

我个人认为,当您的Kafka用户之间几乎没有通信时,在注册表上强制FULL_TRANSITIVE兼容性最有效

Personally, I find enforcing FULL_TRANSITIVE compatibility on the registry works best when you have little to no communication between your Kafka users

这篇关于如何使用kafka模式管理和Avro进行重大更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆