Kafka 如何保证消费者跨分区处理的消息排序? [英] How does Kafka guarantee message ordering as processed by consumers across partitions?

查看:46
本文介绍了Kafka 如何保证消费者跨分区处理的消息排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来源:https://kafka.apache.org/intro

通过在主题内具有并行性(分区)的概念,Kafka 能够提供排序保证和负载平衡在消费者进程池上.这是通过分配将主题中的分区分配给消费者组中的消费者,以便每个分区由组中的一个消费者使用.经过这样做我们确保消费者是唯一的读者分区并按顺序消费数据."

"By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. "

这只是意味着每个消费者都会按顺序处理消息,但是在同一个消费者组中的消费者之间,它可能仍然是乱序的.例如:3 个分区.订阅者通过循环发送 M1 到 P1,M2 到 P2,M3 到 P3,然后 M4 到 P1,M5 到 P2,M6 到 P3.

This only means each consumer will process messages in order, but across consumers in the same consumer group, it may still be out of order. Eg: 3 Partitions. Subscriber via round robin sends M1 to P1, M2 to P2, M3 to P3, then M4 to P1, M5 to P2, and M6 to P3.

现在我们有:P1:M1 和 M4P2:M2和M5P3:M3和M6

Now we have: P1: M1 and M4 P2: M2 and M5 P3: M3 and M6

如果每个消费者都绑定到一个分区,那么 C1 将按顺序处理 M1 和 M4,C2 处理 M2 和 M5,等等.我们如何保证 M2 被处理(由 C2) BEFORE M4 被处理(由C1)?

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc. How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

还是我误会了什么?

推荐答案

我们如何保证在处理 M4 之前(由 C2)处理 M2(由 C1)?

How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

通常你不能.

如果每个消费者都绑定到一个分区,那么 C1 将按顺序处理 M1 和 M4,C2 处理 M2 和 M5,依此类推

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc.

即使您有一个消费者消耗了主题的所有分区,分区也会以不确定的顺序被消耗,并且无法保证所有分区的总排序.

Even if you had a single consumer that consumed all the partitions for the topic, the partitions would be consumed in a non-deterministic order and your total ordering across all partitions would not be guaranteed.

还是我误会了什么?

不,你理解正确.仅在单个分区上保证排序.

Nope, you are understanding correctly. Ordering is only guaranteed on a single partition.

正如 Vishal John 写道:

As Vishal John writes:

例如,假设您的消息是根据 user_id 进行分区的,并考虑 4 条具有 user_ids 1、2、3 和 4 的消息.假设您有一个users"主题有 4 个分区.

For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.

由于分区是基于 user_id 的,假设 user_id 为 1 的消息将进入分区 1,user_id 为 2 的消息将进入分区 2,以此类推.

Since partitioning is based on based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..

还假设您有 4 个主题的消费者.由于您有 4 个消费者,Kafka 会将每个消费者分配到一个分区.所以在这种情况下,只要推送 4 条消息,它们就会立即被消费者消费.

Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.

您可以实现缓冲和重新排序的消费者逻辑,但该逻辑的工作方式取决于您的特定用例.

You can implement consumer logic that buffers and re-orders, but how that logic works depends on your specific use-case.

另见:https://stackoverflow.com/a/39593834/741970.

这篇关于Kafka 如何保证消费者跨分区处理的消息排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆