如何解决 __transaction_state 分区上的检查点无效 __consumer_offsets 和生产者时代的问题 [英] How to solve a problem with checkpointed invalid __consumer_offsets and producer epoch on partitions of __transaction_state

查看:42
本文介绍了如何解决 __transaction_state 分区上的检查点无效 __consumer_offsets 和生产者时代的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 server.log 中有两种日志条目

第一类:

WARN 将 __consumer_offsets-6 的第一个脏偏移重置为记录起始偏移 918,因为 checkpointed 偏移 903 无效.(kafka.log.LogCleanerManager$)

第二种:

INFO [TransactionCoordinator id=3] 初始化的 transactionalId 来源:AppService Kafka 消费者 ->非空字符串过滤器 ->CDMEvent 映射器 ->(NonNull CDMEvent filter -> Map -> Sink: Kafka CDMEvent producer, Nullable CDMEvent filter -> Map -> Sink: Kafka Error producer)-bddeaa8b805c6e008c42fc621339b1b9-2 with producerId 78004 and 7800423231232423231kafka.coordinator.transaction.TransactionCoordinator)

我发现一些建议提到删除检查点文件可能会有所帮助:

https://medium.com/@anishekagarwal/kafka-log-cleaner-issues-80a05e253b8a

我们收集的是:

停止经纪人

删除日志清理检查点文件

( 清洁偏移检查点 )

启动代理

为我们解决了问题."

对所有检查点文件进行尝试是否安全(cleaner-offset-checkpoint、log-start-offset-checkpoint、recovery-point-offset-checkpoint、replication-offset-checkpoint)或它们中的任何一个都不值得推荐吗?

解决方案

我已经停止了每个代理并将cleaner-offset-checkpoint移动到一个备份位置并在没有该文件的情况下启动它,代理整齐地启动,删除了很多过多的段他们不记录:

<块引用><块引用>

WARN 重置 __consumer_offsets 的第一个脏偏移以记录起始偏移,因为检查点偏移无效

显然,这个问题/缺陷https://issues.apache.org/jira/browse/KAFKA-6266 尚未解决,即使在 2.0 中.2. 但是,这并没有按照预期压缩消费者偏移量,即 offsets.retention.minutes 默认为 10080(7 天),我尝试将其明确设置为 5040,但没有帮助,仍然有一个多月前的消息,由于 log.cleaner.enable 默认为 true,它们应该被压缩,但它们不是,唯一可能的尝试是将 cleanup.policy 设置为再次删除 __consumer_offsets 主题,但那是引发问题的行为,所以我有点不愿意这样做.我在这里描述的问题 没有列出的 Kafka 消费者组kafka-consumer-groups.sh 也没有被解决,显然有一些东西阻止 kafka-consumer-groups.sh 读取 __consumer_offsets 主题(当与 --bootstrap-server 选项一起发布时,否则它会读取它来自zookeeper)并显示结果,这是Kafka Tool没有问题的事情,我相信这两个问题是相关联的.我认为该主题没有压缩的原因是,根据代理设置,它包含具有完全相同的键(甚至时间戳)的消息,比它应该的旧.Kafka 工具还会忽略某些记录,并且不会在该显示中将它们解释为消费者组.为什么 kafka-consumer-groups.sh 忽略所有,这可能是由于这些记录的一些损坏.

I have two kinds of log entries in server.log

First kind:

WARN Resetting first dirty offset of __consumer_offsets-6 to log start offset 918 since the checkpointed offset 903 is invalid. (kafka.log.LogCleanerManager$)

Second kind:

INFO [TransactionCoordinator id=3] Initialized transactionalId Source: AppService Kafka consumer -> Not empty string filter -> CDMEvent mapper -> (NonNull CDMEvent filter -> Map -> Sink: Kafka CDMEvent producer, Nullable CDMEvent filter -> Map -> Sink: Kafka Error producer)-bddeaa8b805c6e008c42fc621339b1b9-2 with producerId 78004 and producer epoch 23122 on partition __transaction_state-45 (kafka.coordinator.transaction.TransactionCoordinator)

I have found some suggestion that mentions that removing the checkpoint file might help:

https://medium.com/@anishekagarwal/kafka-log-cleaner-issues-80a05e253b8a

"What we gathered was to:

stop the broker

remove the log cleaner checkpoint file

( cleaner-offset-checkpoint )

start the broker

that solved the problem for us."

Is it safe to try that with all checkpoint files (cleaner-offset-checkpoint, log-start-offset-checkpoint, recovery-point-offset-checkpoint, replication-offset-checkpoint) or is it not recommendable at all with any of them?

解决方案

I have stopped each broker and moved cleaner-offset-checkpoint to a backup location and started it without that file, brokers neatly started, deleted a lot of excessive segments and they don't log:

WARN Resetting first dirty offset of __consumer_offsets to log start offset since the checkpointed offset is invalid

any more, obviously, this issue/defect https://issues.apache.org/jira/browse/KAFKA-6266 is not solved yet, even in 2.0. 2. However, that didn't compact the consumer offset according to expectations, namely offsets.retention.minutes default is 10080 (7 days), and I tried to set it explicitely to 5040, but it didn't help, still there are messages more than one month old, since log.cleaner.enable is by default true, they should be compacted, but they are not, the only possible try is to set the cleanup.policy to delete again for the __consumer_offsets topic, but that is the action that triggered the problem, so I am a bit reluctant to do that. The problem that I described here No Kafka Consumer Group listed by kafka-consumer-groups.sh is also not resolved by that, obviously there is something preventing kafka-consumer-groups.sh to read the __consumer_offsets topic (when issued with --bootstrap-server option, otherwise it reads it from zookeeper) and display results, that's something that Kafka Tool does without problem, and I believe these two problems are connected. And the reason why I think that topic is not compacted, is because it has messages with exactly the same key (and even timestamp), older than it should, according to broker settings. Kafka Tool also ignores certain records and doesn't interpret them as Consumer Groups in that display. Why kafka-consumer-groups.sh ignores all, that is probably due to some corruption of these records.

这篇关于如何解决 __transaction_state 分区上的检查点无效 __consumer_offsets 和生产者时代的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆