Kafka如何保证消费者跨分区处理的消息排序? [英] How does Kafka guarantee message ordering as processed by consumers across partitions?

查看:558
本文介绍了Kafka如何保证消费者跨分区处理的消息排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来源: https://kafka.apache.org/intro

通过在主题内具有并行性(即分区)的概念, Kafka能够提供订购保证和负载平衡 消费者流程池中.这是通过分配 将主题划分给消费者组中的消费者,以便 每个分区仅由组中的一个使用者使用.经过 这样做,我们确保消费者是该产品的唯一读者 分区并按顺序使用数据. "

"By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. "

这仅意味着每个使用者将按顺序处理消息,但是在同一使用者组中的各个使用者之间,消息仍可能是乱序的. 例如:3个分区.订户通过循环发送将M1发送到P1,将M2发送到P2,将M3发送到P3,然后将M4发送到P1,将M5发送到P2,再将M6发送到P3.

This only means each consumer will process messages in order, but across consumers in the same consumer group, it may still be out of order. Eg: 3 Partitions. Subscriber via round robin sends M1 to P1, M2 to P2, M3 to P3, then M4 to P1, M5 to P2, and M6 to P3.

现在我们有: P1:M1和M4 P2:M2和M5 P3:M3和M6

Now we have: P1: M1 and M4 P2: M2 and M5 P3: M3 and M6

如果每个使用者都绑定到一个分区,那么C1将按该顺序处理M1和M4,C2将处理M2和M5,等等.我们如何保证在M4被处理之前(由C2处理)(由C2处理). C1)?

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc. How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

还是我误会了什么?

推荐答案

我们如何保证在M4被C1处理之前(由C2处理)?

How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

通常你不能.

如果每个使用者都绑定到一个分区,那么C1将按该顺序处理M1和M4,C2将按顺序处理M2和M5,等等.

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc.

即使您只有一个使用者使用了该主题的所有分区,也将以不确定的顺序使用这些分区,并且无法保证您在所有分区上的总顺序.

Even if you had a single consumer that consumed all the partitions for the topic, the partitions would be consumed in a non-deterministic order and your total ordering across all partitions would not be guaranteed.

还是我误会了什么?

Or am I misunderstanding something ?

不,您的理解正确.仅在单个分区上保证订购.

Nope, you are understanding correctly. Ordering is only guaranteed on a single partition.

如Vishal John 写道:

As Vishal John writes:

例如,假设您的消息是根据user_id进行分区的,并考虑4条具有user_id 1、2、3和4的消息.假定您有一个带有4个分区的用户"主题.

For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.

由于分区是基于user_id进行的,因此假设具有user_id 1的邮件将进入分区1,具有user_id 2的邮件将进入分区2,依此类推.

Since partitioning is based on based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..

还假设您有4个该主题的使用者.由于您有4个使用者,因此Kafka会将每个使用者分配到一个分区.因此,在这种情况下,一旦推送了4条消息,消费者便会立即使用它们.

Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.

您可以实现缓冲和重新排序的使用者逻辑,但是该逻辑如何工作取决于您的特定用例.

You can implement consumer logic that buffers and re-orders, but how that logic works depends on your specific use-case.

另请参阅: https://stackoverflow.com/a/39593834/741970 .

这篇关于Kafka如何保证消费者跨分区处理的消息排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆