具有多个分区的 Apache Kafka 消息顺序 [英] Apache Kafka order of messages with multiple partitions
问题描述
根据 Apache Kafka 文档,消息的顺序可以在分区内或主题中的一个分区内实现.在这种情况下,我们获得的并行性优势是什么,它相当于传统的 MQ,不是吗?
As per Apache Kafka documentation, the order of the messages can be achieved within the partition or one partition in a topic. In this case, what is the parallelism benefit we are getting and it is equivalent to traditional MQs, isn't it?
推荐答案
在 Kafka 中,并行度等于一个主题的分区数.
例如,假设您的消息根据 user_id 进行分区,并考虑 4 个具有 user_ids 1、2、3 和 4 的消息.假设您有一个具有 4 个分区的用户"主题.
For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.
由于分区是基于 user_id 的,假设 user_id 为 1 的消息将进入分区 1,user_id 为 2 的消息将进入分区 2,以此类推.
Since partitioning is based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..
还假设您有 4 个主题的消费者.由于您有 4 个消费者,Kafka 会将每个消费者分配到一个分区.所以在这种情况下,只要推送 4 条消息,它们就会立即被消费者消费.
Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.
如果主题有 2 个消费者而不是 4 个,那么每个消费者将处理 2 个分区,消耗吞吐量将几乎是一半.
If you had 2 consumers for the topic instead of 4, then each consumer will be handling 2 partitions and the consuming throughput will be almost half.
要完全回答您的问题,Kafka 仅提供分区内消息的总顺序,而不提供主题中不同分区之间的总顺序.
即,如果分区 2 中的消费非常慢而分区 4 中的消费非常快,那么 user_id 4 的消息将在 user_id 2 的消息之前被消费.这就是 Kafka 的设计方式.
ie, if consumption is very slow in partition 2 and very fast in partition 4, then message with user_id 4 will be consumed before message with user_id 2. This is how Kafka is designed.
这篇关于具有多个分区的 Apache Kafka 消息顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!