产生主题时,Kafka Streams不会将偏移量增加1 [英] Kafka Streams does not increment offset by 1 when producing to topic

查看:78
本文介绍了产生主题时,Kafka Streams不会将偏移量增加1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一个简单的Kafka Dead信记录处理器.

当使用由控制台生产者产生的记录时,它完美地工作.

但是我发现我们的Kafka Streams应用程序不能保证向接收器主题生成记录,即对于每条产生的记录,偏移量将增加1.

死信处理器背景:

我有一个方案,在发布处理记录所需的所有数据之前,可能会先接收记录. 当记录与流应用程序不匹配以进行处理时,它们将移至死信"主题,而不是继续向下流.当发布新数据时,我们将最新的消息从死信"主题转储回流应用程序的源主题,以使用新数据进行重新处理.

死信处理器:

  • 在运行应用程序的开始处,记录每个分区的结束偏移量
  • 结束偏移量标记为停止处理给定死信"主题的记录的点,以避免在重新处理的记录返回死信"主题时发生无限循环.
  • 应用程序从上次运行通过消费者组产生的最后偏移中恢复.
  • 应用程序正在使用事务,并且KafkaProducer#sendOffsetsToTransaction提交最后产生的偏移量.

要跟踪何时针对某个主题的分区处理了我范围内的所有记录,我的服务将其从生产者到生产者的最后产生的偏移量与消费者保存的结束偏移量图进行比较.当我们到达结束偏移量时,使用者通过KafkaConsumer#pause暂停该分区,并且当所有分区都被暂停(这意味着它们已达到保存的结束偏移量)时,它就会退出.

Kafka Consumer API 状态:

偏移量和消费者位置 Kafka为分区中的每个记录维护一个数字偏移量.此偏移量充当该分区内记录的唯一标识符,并且还指示使用者在分区中的位置.例如,位置5的使用者使用了偏移量为0到4的记录,然后将接收偏移量为5的记录.

解决方案

即使JavaDocs指明了此消息偏移量(似乎应该更新JavaDocs),也不正式将消息偏移量增加1. /p>

因此,通常您不应依赖连续的偏移量.您唯一得到的保证是,每个偏移量在分区内都是唯一的.

I have implemented a simple Kafka Dead letter record processor.

It works perfectly when using records produced from the Console producer.

However I find that our Kafka Streams applications do not guarantee that producing records to the sink topics that the offsets will be incremented by 1 for each record produced.

Dead Letter Processor Background:

I have a scenario where records may be received before all data required to process it is published. When records are not matched for processing by the streams app they are move to a Dead letter topic instead of continue to flow down stream. When new data is published we dump the latest messages from the Dead letter topic back in to the stream application's source topic for reprocessing with the new data.

The Dead Letter processor:

  • At the start of the run application records the ending offsets of each partition
  • The ending offsets marks the point to stop processing records for a given Dead Letter topic to avoid infinite loop if reprocessed records return to Dead Letter topic.
  • Application resumes from the last Offsets produced by the previous run via consumer groups.
  • Application is using transactions and KafkaProducer#sendOffsetsToTransaction to commit the last produced offsets.

To track when all records in my range are processed for a topic's partition my service compares its last produced offset from the producer to the the consumers saved map of ending offsets. When we reach the ending offset the consumer pauses that partition via KafkaConsumer#pause and when all partitions are paused (meaning they reached the saved Ending offset)then calls it exits.

The Kafka Consumer API States:

Offsets and Consumer Position Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition. For example, a consumer which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5.

The Kafka Producer API references the next offset is always +1 as well.

Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered committed only if the transaction is committed successfully. The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.

But you can clearly see in my debugger that the records consumed for a single partition are anything but incremented 1 at a time...

I thought maybe this was a Kafka configuration issue such as max.message.bytes but none really made sense. Then I thought perhaps it is from joining but didn't see any way that would change the way the producer would function.

Not sure if it is relevant or not but all of our Kafka applications are using Avro and Schema Registry...

Should the offsets always increment by 1 regardless of method of producing or is it possible that using Kafka streams API does not offer the same guarantees as the normal Producer Consumer clients?

Is there just something entirely that I am missing?

解决方案

It is not an official API contract that message offsets are increased by one, even if the JavaDocs indicate this (it seems that the JavaDocs should be updated).

  • If you don't use transactions, you get either at-least-once semantics or no guarantees (some call this at-most-once semantics). For at-least-once, records might be written twice and thus, offsets for two consecutive messages are not really increased by one as the duplicate write "consumes" two offsets.

  • If you use transactions, each commit (or abort) of a transaction writes a commit (or abort) marker into the topic -- those transactional markers also "consume" one offset (this is what you observe).

Thus, in general you should not rely on consecutive offsets. The only guarantee you get is, that each offset is unique within a partition.

这篇关于产生主题时,Kafka Streams不会将偏移量增加1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆