具有多个分区的 Apache Kafka 消息顺序 [英] Apache Kafka order of messages with multiple partitions

查看:38
本文介绍了具有多个分区的 Apache Kafka 消息顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 Apache Kafka 文档,消息的顺序可以在分区内或主题中的一个分区内实现.在这种情况下,我们获得的并行性优势是什么,它相当于传统的 MQ,不是吗?

As per Apache Kafka documentation, the order of the messages can be achieved within the partition or one partition in a topic. In this case, what is the parallelism benefit we are getting and it is equivalent to traditional MQs, isn't it?

推荐答案

在 Kafka 中,并行度等于一个主题的分区数.

例如,假设您的消息根据 user_id 进行分区,并考虑 4 个具有 user_ids 1、2、3 和 4 的消息.假设您有一个具有 4 个分区的用户"主题.

For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.

由于分区是基于 user_id 的,假设 user_id 为 1 的消息将进入分区 1,user_id 为 2 的消息将进入分区 2,以此类推.

Since partitioning is based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..

还假设您有 4 个主题的消费者.由于您有 4 个消费者,Kafka 会将每个消费者分配到一个分区.所以在这种情况下,只要推送 4 条消息,它们就会立即被消费者消费.

Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.

如果主题有 2 个消费者而不是 4 个,那么每个消费者将处理 2 个分区,消耗吞吐量将几乎是一半.

If you had 2 consumers for the topic instead of 4, then each consumer will be handling 2 partitions and the consuming throughput will be almost half.

要完全回答您的问题,Kafka 仅提供分区内消息的总顺序,而不提供主题中不同分区之间的总顺序.

即,如果分区 2 中的消费非常慢而分区 4 中的消费非常快,那么 user_id 4 的消息将在 user_id 2 的消息之前被消费.这就是 Kafka 的设计方式.

ie, if consumption is very slow in partition 2 and very fast in partition 4, then message with user_id 4 will be consumed before message with user_id 2. This is how Kafka is designed.

这篇关于具有多个分区的 Apache Kafka 消息顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆