按键加入多个Kafka主题 [英] Join multiple Kafka topics by key
问题描述
如何编写能够以可扩展方式加入多个Kafka主题的使用者?
How can write a consumer that joins multiple Kafka topics in a scalable way?
我有一个主题,该主题发布了带有键的事件,第二个主题发布了与第一个具有相同键的子集相关的其他事件.我想写一个订阅者,订阅者同时订阅两个主题,并对出现在两个主题中的子集执行一些其他操作.
I have a topic that published events with a key and a second topic that publishes other events related to a subset of the first with the same key. I would like to write a consumer that subscribes to both topics and performs some additional actions for the subset that appears in both topics.
我可以用一个使用者轻松地做到这一点:从两个主题中读取所有内容,在本地维护状态,并在为给定键读取了两个事件时执行操作.但是我需要扩展解决方案.
I can do this easily with a single consumer: read everything from both topics, maintaining state locally and perform the actions when both events have been read for a given key. But I need the solution to scale.
理想情况下,我需要将主题捆绑在一起,以便以相同的方式对主题进行分区,并将分区同步分配给消费者.我该怎么办?
Ideally I need to tie the topics together so that they are partitioned the same way and the partitions are assigned to consumers in sync. How can i do this?
我知道Kafka Streams将主题结合在一起,以便将密钥分配给相同的节点.他们是如何做到的呢? P.S.我无法使用Kafka Streams,因为我正在使用Python.
I know Kafka Streams joins topics together such that keys are allocated to the same nodes. How do they do it? P.S. I can't used Kafka Streams because I'm using Python.
推荐答案
使用Python太糟糕了-Kafka Streams非常适合:)
Too bad you are on Python -- Kafka Streams would be a perfect fit :)
如果要手动执行此操作,则需要实现自己的PartitionAssignor
-实现必须确保分区在任务中位于同一位置:假设每个主题有4个分区(我们称它们为A和B),则必须将分区A_0和B_0分配给相同的使用者(也应分配给A_1和B_1,...).
If you want to do this manually, you will need to implement your own PartitionAssignor
-- this, implementation must ensure, that partitions are co-located in the assignment: Assume you have 4 partitions per topic (let's call them A and B), than partitions A_0 and B_0 must be assigned to the same consumer (also A_1 and B_1, ...).
我希望Python使用者允许您通过配置参数partition.assignment.strategy
指定自定义分区分配器.
I hope Python consumer allows you to specify a custom partition assignor via config parameter partition.assignment.strategy
.
这是Kafka Streams使用的PartitionAssignor
:
This is the PartitionAssignor
Kafka Streams uses: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java
Streams使用任务的概念-任务获取分配了相同分区号的不同主题的分区. Streams还尝试执行粘性分配"-即,如果可能的话,在重新平衡的情况下不要移动任务(因此不要移动分区).因此,每个消费者在重新平衡元数据中编码其旧分配".
Streams uses the concept of tasks -- a tasks gets partitions of different topics with the same partition number assigned. Streams also tries to do a "sticky assignment" -- ie., don't move task (and thus partitions) in case of rebalance if possible. Thus, each consumer encodes its "old assignment" in the rebalance metadata.
基本上,在每个活动的使用者上都调用方法#subscription()
.它将把消费者的订阅信息(即消费者想要订阅的主题)以及可选的元数据发送给代理.
Basically, the method #subscription()
is called on each consumer that is alive. It will send the subscription information of the consumer (ie, to what topics a consumer wants to subscribe) plus optional metadata to the brokers.
第二步,消费者组的负责人将在#assign()
中计算实际分配.负责的经纪人收集重新平衡的第一阶段中#subscription()
提供的所有信息,并将其交给#assign()
.这样,领导者可以获得整个团队的全局概览,从而可以确保以同位方式分配分区.
In a second step, the leader of the consumer group, will compute the actual assignment, within #assign()
. The responsible broker collects all information given by #subscription()
in the first phase of the rebalance and hands it to #assign()
. Thus, the leader gets a global overview over the whole group, and thus can ensure that partitions are assigned in a co-located manner.
在最后一步中,经纪人从领导者那里收到了计算得出的分配,并将其广播给该组的所有消费者.这将导致在每个使用者上调用#onAssignment()
.
In the last step, the broker received the computed assignment from the leader, and broadcasts it to all consumers of the group. This will result in a call to #onAssignment()
on each consumer.
这可能也有帮助:
- https://cwiki.apache.org/confluence/display/KAFKA/Kafka + Streams +建筑
- http://docs.confluent.io/current/streams/architecture.html
- https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture
- http://docs.confluent.io/current/streams/architecture.html
这篇关于按键加入多个Kafka主题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!