Kafka Stream在重新平衡时重新处理旧消息 [英] Kafka Stream reprocessing old messages on rebalancing

查看:181
本文介绍了Kafka Stream在重新平衡时重新处理旧消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Kafka Streams应用程序,该应用程序从几个主题中读取数据,将数据加入并写入另一个主题.

I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.

这是我的Kafka群集的配置:

This is the configuration of my Kafka cluster:

5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3. 

我的Kafka Streams应用程序与我的Kafka代理在同一台计算机上运行.

My Kafka Streams applications are running on the same machines as my Kafka broker.

每小时消耗/生产几百万条记录.每当我关闭某个代理时,该应用程序便会进入重新平衡状态,并且经过多次重新平衡之后,该应用程序才开始使用非常旧的消息.

A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.

注意:当Kafka Streams应用程序运行良好时,其消费者延迟几乎为0.但是在重新平衡之后,其延迟从0变为1000万.

这可能是因为 offset.retention.minutes .

这是我的Kafka经纪人的日志和偏移量保留策略配置:

This is the log and offset retention policy configuration of my Kafka broker:

log retention policy : 3 days
offset.retention.minutes : 1 day

在下面的链接中,我读到这可能是原因:

In the below link I read that this could be the cause:

偏移保留时间参考

在此方面的任何帮助将不胜感激.

Any help in this would be appreciated.

推荐答案

偏移保留会产生影响.参见本常见问题解答:

Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning

也请参见如何使用Kafka Stream手动提交?如何使用Kafka Stream手动提交?有关如何投入工作.

Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.

这篇关于Kafka Stream在重新平衡时重新处理旧消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆