如何解决__transaction_state分区上带有检查点的无效__consumer_offsets和生产者纪元的问题 [英] How to solve a problem with checkpointed invalid __consumer_offsets and producer epoch on partitions of __transaction_state

查看:154
本文介绍了如何解决__transaction_state分区上带有检查点的无效__consumer_offsets和生产者纪元的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

server.log中有两种日志条目

I have two kinds of log entries in server.log

第一种:

WARN由于 checkpointed 偏移量903无效,因此将 __ consumer_offsets-6 的第一脏偏移量重置为记录起始偏移量918.( kafka.log.LogCleanerManager $ )

WARN Resetting first dirty offset of __consumer_offsets-6 to log start offset 918 since the checkpointed offset 903 is invalid. (kafka.log.LogCleanerManager$)

第二种:

INFO [TransactionCoordinator id=3] Initialized transactionalId Source: AppService Kafka consumer -> Not empty string filter -> CDMEvent mapper -> (NonNull CDMEvent filter -> Map -> Sink: Kafka CDMEvent producer, Nullable CDMEvent filter -> Map -> Sink: Kafka Error producer)-bddeaa8b805c6e008c42fc621339b1b9-2 with producerId 78004 and producer epoch 23122 on partition __transaction_state-45 (kafka.coordinator.transaction.TransactionCoordinator)

我发现一些建议提到删除检查点文件可能会有所帮助:

I have found some suggestion that mentions that removing the checkpoint file might help:

https://medium.com/@anishekagarwal/kafka-log-cleaner-issues-80a05e253b8a

我们收集到的是:

停止经纪人

删除日志清理程序检查点文件

remove the log cleaner checkpoint file

( cleaner-offset-checkpoint )

启动经纪人

为我们解决了问题."

that solved the problem for us."

可以安全地尝试使用所有检查点文件(更清洁偏移检查点,日志开始偏移检查点,恢复点偏移检查点,复制偏移检查点)或与它们中的任何一个都根本不值得推荐吗?

Is it safe to try that with all checkpoint files (cleaner-offset-checkpoint, log-start-offset-checkpoint, recovery-point-offset-checkpoint, replication-offset-checkpoint) or is it not recommendable at all with any of them?

推荐答案

我已经停止了每个代理,并将cleaner-offset-checkpoint移到了备份位置,并在没有该文件的情况下启动了该文件,代理巧妙地启动了,删除了很多多余的段并且他们不记录:

I have stopped each broker and moved cleaner-offset-checkpoint to a backup location and started it without that file, brokers neatly started, deleted a lot of excessive segments and they don't log:

由于检查点偏移量无效,因此将__consumer_offsets的第一个脏偏移量重置为日志起始偏移量

WARN Resetting first dirty offset of __consumer_offsets to log start offset since the checkpointed offset is invalid

显然,这个问题/缺陷 https://issues.apache.org/jira/browse/KAFKA-6266 仍未解决,即使在2.0中也是如此.2.但是,这并没有按照预期压缩消费者偏移量,即offsets.retention.minutes的默认值为10080(7天),我试图将其明确设置为5040,但是它没有帮助,仍然存在消息已存在一个月以上,因为log.cleaner.enable默认情况下为true,因此应该对其进行压缩,但实际上并非如此,唯一的尝试是将cleanup.policy设置为针对__consumer_offsets主题再次删除,但这就是引发问题的操作,所以我有点不愿意这样做.我在此处描述的问题没有列出的Kafka消费者组kafka-consumer-groups.sh 也无法解决,显然有一些阻止kafka-consumer-groups.sh读取__consumer_offsets主题(当使用--bootstrap-server选项发出时,否则它将读取它)(来自zookeeper)并显示结果,这就是Kafka Tool可以毫无问题地完成的工作,我相信这两个问题是有联系的.我认为该主题未压缩的原因是,根据代理设置,该主题的消息具有完全相同的密钥(甚至是时间戳),且早于其应有的时间.Kafka工具还忽略某些记录,并且在该显示中不会将它们解释为消费者组.为什么kafka-consumer-groups.sh忽略所有内容,可能是由于这些记录的某些损坏.

any more, obviously, this issue/defect https://issues.apache.org/jira/browse/KAFKA-6266 is not solved yet, even in 2.0. 2. However, that didn't compact the consumer offset according to expectations, namely offsets.retention.minutes default is 10080 (7 days), and I tried to set it explicitely to 5040, but it didn't help, still there are messages more than one month old, since log.cleaner.enable is by default true, they should be compacted, but they are not, the only possible try is to set the cleanup.policy to delete again for the __consumer_offsets topic, but that is the action that triggered the problem, so I am a bit reluctant to do that. The problem that I described here No Kafka Consumer Group listed by kafka-consumer-groups.sh is also not resolved by that, obviously there is something preventing kafka-consumer-groups.sh to read the __consumer_offsets topic (when issued with --bootstrap-server option, otherwise it reads it from zookeeper) and display results, that's something that Kafka Tool does without problem, and I believe these two problems are connected. And the reason why I think that topic is not compacted, is because it has messages with exactly the same key (and even timestamp), older than it should, according to broker settings. Kafka Tool also ignores certain records and doesn't interpret them as Consumer Groups in that display. Why kafka-consumer-groups.sh ignores all, that is probably due to some corruption of these records.

这篇关于如何解决__transaction_state分区上带有检查点的无效__consumer_offsets和生产者纪元的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆