分区数量超过消费者时的 Apache Kafka 消息消耗 [英] Apache Kafka message consumption when partitions outnumber consumers

查看:40
本文介绍了分区数量超过消费者时的 Apache Kafka 消息消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我运行的 Kafka 集群的分区数多于我的单独消费者组的消费者数.是否对消息的排序或跨分区的消息准时传递做出任何保证?

If I'm running a Kafka cluster with more partitions than my lone consumer group has consumers. Are there any guarantees made on ordering of messages, or on-time delivery of messages across partitions?

简单例子:
2 个分区,1 个消费者
生产者通过一个键控制分区分配.
消息 1 进入并转到分区 A
消息 2 进入并转到分区 B
消息 3 进入并转到分区 A

Simple example:
2 Partitions, 1 Consumer
The Producers are controlling Partition assignment via a key.
Message 1 comes in and goes to Partition A
Message 2 comes in and goes to Partition B
Message 3 comes in and goes to Partition A

我知道消息 1 会在消息 3 之前被消费,因为它们在同一个分区中.但是消息 2 呢?它会在消息 3 之前还是之后被消费?或者可能会有所不同?是否有可能在消息 1 之前被消费?

I know Message 1 will be consumed before Message 3, because they are in the same partition. But what about Message 2? Will it be consumed before Message 3 or after? Or could it vary? Could it possibly be consumed before Message 1?

此外,如果新消息继续进入 A 分区并且生产速度快于消耗量怎么办?消息 2 会无限期地放在分区 B 中吗?什么时候会消耗掉?是否有任何保证消息不会永远留在那里?

Moreover, what if new Messages continue to come in for Partition A and the production is faster than consumption? Will Message 2 sit in Partition B indefinitely? When will it be consumed? Are there any guarantees that the messages will not sit there forever?

更一般地:如果消费者被分配到多个分区,那么消费者如何以及何时在这些分区之间进行交换?

More generally: If a consumer is assigned to multiple partitions, how and when does that consumer swap between those partitions?

推荐答案

订单保证

Kafka 仅在分区内提供排序保证.在您的示例中,消息 2 可能在消息 1 之前、消息 1 之后或消息 3 之后被使用.这仅取决于使用者的性能.文档中提供了有关这方面的更多信息:https://kafka.apache.org/documentation.html#introduction(消费者"和保证"主题).

Ordering guarantees

Kafka provides ordering guarantees only within a partition. In your example, Message 2 might be consumed either before Message 1, after Message 1 or after Message 3. That's only depends on the performance of the consumer. More information on this is available in the documentation: https://kafka.apache.org/documentation.html#introduction ('Consumers' and 'Guarantees' topics).

Kafka 代理不知道消费者.它将消息存储在日志段中,直到相应的日志段被删除.消费者可以随时连接到代理并从最旧的日志段开始消费.最短消息保留时间由两个配置属性控制:log.retention.hourslog.retention.bytes(每个主题可能覆盖).文档中有更多相关信息:https://kafka.apache.org/documentation.html#brokerconfigs.

Kafka broker is not aware of the consumers. It stores the messages in log segments until corresponding log segment gets deleted. Consumers may attach to the broker at any moment and start consumption from the oldest log segment. Minimum message retention time is controlled by two configuration properties: log.retention.hours and log.retention.bytes (with possible overrides per topic). More on this in documentation: https://kafka.apache.org/documentation.html#brokerconfigs.

回答你的问题:如果消费者最终比生产者慢,它有一些时间赶上(默认为 1 周).如果不这样做,一些未被消费的消息将被永久删除.

Answering your question: if the consumer eventually gets slower than producer, it has some time to catch up (1 week by default). If it doesn't, some non-consumed messages will be deleted forever.

高级消费者创建多个 KafkaStream 对象,每个对象提供来自一个或多个分区的数据.如何使用这些流由您决定:在单独的线程、循环等中.还可以获取消息的时间戳并将这些流合并为单个流以恢复消息顺序.

High-level consumer creates several KafkaStream objects, each providing data from one or multiple partitions. It's up to you how to consume these streams: in separate threads, round robin, etc. It's also possible to fetch timestamps of messages and merge the streams into a single stream restoring message order.

这篇关于分区数量超过消费者时的 Apache Kafka 消息消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆