按键加入多个Kafka主题 [英] Join multiple Kafka topics by key

查看:45
本文介绍了按键加入多个Kafka主题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何编写一个以可扩展方式加入多个 Kafka 主题的消费者?

How can write a consumer that joins multiple Kafka topics in a scalable way?

我有一个主题用一个键发布事件,第二个主题用相同的键发布与第一个主题的子集相关的其他事件.我想编写一个订阅两个主题的消费者,并为出现在两个主题中的子集执行一些额外的操作.

I have a topic that published events with a key and a second topic that publishes other events related to a subset of the first with the same key. I would like to write a consumer that subscribes to both topics and performs some additional actions for the subset that appears in both topics.

我可以用一个消费者轻松地做到这一点:从两个主题中读取所有内容,在本地维护状态并在为给定键读取两个事件时执行操作.但我需要解决方案来扩展.

I can do this easily with a single consumer: read everything from both topics, maintaining state locally and perform the actions when both events have been read for a given key. But I need the solution to scale.

理想情况下,我需要将主题联系在一起,以便它们以相同的方式进行分区,并且分区会同步分配给消费者.我该怎么做?

Ideally I need to tie the topics together so that they are partitioned the same way and the partitions are assigned to consumers in sync. How can i do this?

我知道 Kafka Streams 将主题连接在一起,以便将密钥分配给相同的节点.他们是怎么做到的呢?附言我无法使用 Kafka Streams,因为我使用的是 Python.

I know Kafka Streams joins topics together such that keys are allocated to the same nodes. How do they do it? P.S. I can't used Kafka Streams because I'm using Python.

推荐答案

可惜你使用的是 Python -- Kafka Streams 将是一个完美的选择 :)

Too bad you are on Python -- Kafka Streams would be a perfect fit :)

如果您想手动执行此操作,则需要实现您自己的 PartitionAssignor -- 此实现必须确保分区在分配中位于同一位置:假设您每个分区有 4 个分区主题(我们称它们为 A 和 B),而不是分区 A_0 和 B_0 必须分配给同一个使用者(还有 A_1 和 B_1,...).

If you want to do this manually, you will need to implement your own PartitionAssignor -- this, implementation must ensure, that partitions are co-located in the assignment: Assume you have 4 partitions per topic (let's call them A and B), than partitions A_0 and B_0 must be assigned to the same consumer (also A_1 and B_1, ...).

我希望 Python 消费者允许您通过配置参数 partition.assignment.strategy 指定自定义分区分配器.

I hope Python consumer allows you to specify a custom partition assignor via config parameter partition.assignment.strategy.

这是 Kafka Streams 使用的 PartitionAssignor:https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java

This is the PartitionAssignor Kafka Streams uses: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java

Streams 使用任务的概念——任务获取分配相同分区号的不同主题的分区.Streams 还尝试执行粘性分配"——即,如果可能,在重新平衡的情况下不要移动任务(以及分区).因此,每个消费者都在重新平衡元数据中对其旧分配"进行编码.

Streams uses the concept of tasks -- a tasks gets partitions of different topics with the same partition number assigned. Streams also tries to do a "sticky assignment" -- ie., don't move task (and thus partitions) in case of rebalance if possible. Thus, each consumer encodes its "old assignment" in the rebalance metadata.

基本上,方法 #subscription() 在每个活着的消费者上被调用.它会将消费者的订阅信息(即消费者想要订阅哪些主题)以及可选的元数据发送给代理.

Basically, the method #subscription() is called on each consumer that is alive. It will send the subscription information of the consumer (ie, to what topics a consumer wants to subscribe) plus optional metadata to the brokers.

在第二步中,消费者组的领导者将在 #assign() 内计算实际分配.负责经纪人在重新平衡的第一阶段收集 #subscription() 提供的所有信息,并将其交给 #assign().因此,Leader 获得了整个组的全局概览,从而可以确保以协同定位的方式分配分区.

In a second step, the leader of the consumer group, will compute the actual assignment, within #assign(). The responsible broker collects all information given by #subscription() in the first phase of the rebalance and hands it to #assign(). Thus, the leader gets a global overview over the whole group, and thus can ensure that partitions are assigned in a co-located manner.

在最后一步中,broker 从领导者那里收到计算出的分配,并将其广播给该组的所有消费者.这将导致在每个消费者上调用 #onAssignment().

In the last step, the broker received the computed assignment from the leader, and broadcasts it to all consumers of the group. This will result in a call to #onAssignment() on each consumer.

这也可能有帮助:

这篇关于按键加入多个Kafka主题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆