分区数量超过使用者时,Apache Kafka消息消耗 [英] Apache Kafka message consumption when partitions outnumber consumers

查看:115
本文介绍了分区数量超过使用者时,Apache Kafka消息消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我运行的Kafka集群的分区数超过了我的唯一消费者组的消费者数.是否可以保证消息的排序或跨分区的消息的按时传递?

If I'm running a Kafka cluster with more partitions than my lone consumer group has consumers. Are there any guarantees made on ordering of messages, or on-time delivery of messages across partitions?

简单示例:
2个分区,1个消费者
生产者通过密钥控制分区分配.
消息1进入并进入分区A
消息2进入并进入分区B
消息3进入分区A

Simple example:
2 Partitions, 1 Consumer
The Producers are controlling Partition assignment via a key.
Message 1 comes in and goes to Partition A
Message 2 comes in and goes to Partition B
Message 3 comes in and goes to Partition A

我知道消息1将在消息3之前被消耗掉,因为它们在同一分区中.但是消息2呢?它会在消息3之前还是之后被消耗掉?还是会有所不同?可以在消息1之前使用它吗?

I know Message 1 will be consumed before Message 3, because they are in the same partition. But what about Message 2? Will it be consumed before Message 3 or after? Or could it vary? Could it possibly be consumed before Message 1?

此外,如果分区A继续收到新消息并且生产速度快于消耗速度,该怎么办?消息2是否会无限期地位于分区B中?什么时候食用?是否可以保证邮件不会永远存在?

Moreover, what if new Messages continue to come in for Partition A and the production is faster than consumption? Will Message 2 sit in Partition B indefinitely? When will it be consumed? Are there any guarantees that the messages will not sit there forever?

更一般而言:如果将使用者分配到多个分区,那么该使用者如何以及何时在这些分区之间交换?

More generally: If a consumer is assigned to multiple partitions, how and when does that consumer swap between those partitions?

推荐答案

订购保证

Kafka仅在分区内提供排序保证.在您的示例中,消息2可能在消息1之前,消息1之后或消息3之后被消费.这仅取决于使用者的性能.有关详细信息,请参见文档: https://kafka.apache.org/documentation.html#introduction(消费者"和担保"主题).

Ordering guarantees

Kafka provides ordering guarantees only within a partition. In your example, Message 2 might be consumed either before Message 1, after Message 1 or after Message 3. That's only depends on the performance of the consumer. More information on this is available in the documentation: https://kafka.apache.org/documentation.html#introduction ('Consumers' and 'Guarantees' topics).

Kafka经纪人不了解消费者.它将消息存储在日志段中,直到删除相应的日志段.消费者可以随时与经纪人建立联系,并从最早的日志段开始消费.最小消息保留时间由两个配置属性控制: log.retention.hours log.retention.bytes (每个主题可能有替代).有关此文档的更多信息,请参见: https://kafka.apache.org/documentation.html#brokerconfigs .

Kafka broker is not aware of the consumers. It stores the messages in log segments until corresponding log segment gets deleted. Consumers may attach to the broker at any moment and start consumption from the oldest log segment. Minimum message retention time is controlled by two configuration properties: log.retention.hours and log.retention.bytes (with possible overrides per topic). More on this in documentation: https://kafka.apache.org/documentation.html#brokerconfigs.

回答您的问题:如果消费者最终变得比生产者慢,那么它有一些时间可以赶上(默认为1周).否则,一些未使用的邮件将被永久删除.

Answering your question: if the consumer eventually gets slower than producer, it has some time to catch up (1 week by default). If it doesn't, some non-consumed messages will be deleted forever.

高级使用者创建多个 KafkaStream 对象,每个对象都提供一个或多个分区中的数据.由您决定如何使用这些流:在单独的线程,循环等中使用.还可以获取消息的时间戳并将这些流合并为单个流,以恢复消息顺序.

High-level consumer creates several KafkaStream objects, each providing data from one or multiple partitions. It's up to you how to consume these streams: in separate threads, round robin, etc. It's also possible to fetch timestamps of messages and merge the streams into a single stream restoring message order.

这篇关于分区数量超过使用者时,Apache Kafka消息消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆