Kafka Stream 在重新平衡时重新处理旧消息 [英] Kafka Stream reprocessing old messages on rebalancing

查看:40
本文介绍了Kafka Stream 在重新平衡时重新处理旧消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Kafka Streams 应用程序,它从几个主题读取数据,连接数据并将其写入另一个主题.

I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.

这是我的Kafka集群的配置:

This is the configuration of my Kafka cluster:

5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3. 

我的 Kafka Streams 应用程序与我的 Kafka 代理运行在同一台机器上.

My Kafka Streams applications are running on the same machines as my Kafka broker.

每小时消耗/产生几百万条记录.每当我关闭代理时,应用程序都会进入重新平衡状态,并且在多次重新平衡后,它开始使用非常旧的消息.

A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.

注意:当 Kafka Streams 应用程序运行良好时,其消费延迟几乎为 0.但在重新平衡后,其延迟从 0 变为 1000 万.

这可能是因为offset.retention.minutes.

这是我的 Kafka 代理的日志和偏移保留策略配置:

This is the log and offset retention policy configuration of my Kafka broker:

log retention policy : 3 days
offset.retention.minutes : 1 day

在下面的链接中,我读到这可能是原因:

In the below link I read that this could be the cause:

偏移保留记录参考

对此的任何帮助将不胜感激.

Any help in this would be appreciated.

推荐答案

偏移保留会产生影响.参见此常见问题解答:https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning

Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning

另请参阅 如何使用 Kafka Stream 手动提交?如何使用 Kafka Stream 手动提交?提交工作.

Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.

这篇关于Kafka Stream 在重新平衡时重新处理旧消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆