清理日志文件时 Kafka 代理关闭 [英] Kafka broker shutdown while cleaning up log files

查看:34
本文介绍了清理日志文件时 Kafka 代理关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有 3 个代理的 Kafka 集群(版本:1.1.0)并且运行良好超过 6 个月.

Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 months.

然后我们在 2018/12/12 之后将每个主题的分区从 3 修改为 48,然后代理每 5-10 天关闭一次.

Then we modified partitions from 3 to 48 for every topic after 2018/12/12, then the brokers shutdown every 5-10 days.

然后我们将代理从 1.1.0 升级到 2.1.0,但代理仍然每 5-10 天关闭一次.

Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep shutting down every 5-10 days.

每次,一个 broker 在出现以下错误日志后关闭,几分钟后,其他 2 个 broker 也关闭,出现相同的错误,但其他分区日志文件.

Each time, one broker shut down after the following error log, then several minutes later, the other 2 brokers shut down too, with the same error but other partition log files.

[2019-01-11 17:16:36,572] INFO [ProducerStateManager partition=__transaction_state-11] Writing producer snapshot at offset 807760 (kafka.log.ProducerStateManager)
[2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. (kafka.log.Log)
[2019-01-11 17:16:46,150] WARN Resetting first dirty offset of __transaction_state-35 to log start offset 194404 since the checkpointed offset 194345 is invalid. (kafka.log.LogCleanerManager$)
[2019-01-11 17:16:46,239] ERROR Failed to clean up log for __transaction_state-11 in dir /kafka/logs due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
        at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
        at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
        at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
        at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
        at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
        at scala.collection.immutable.List.foreach(List.scala:388)
        at kafka.log.Log.replaceSegments(Log.scala:1896)
        at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
        at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
        at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
        at scala.collection.immutable.List.foreach(List.scala:388)
        at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
        at kafka.log.Cleaner.clean(LogCleaner.scala:492)
        at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
        at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
        at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
        Suppressed: java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log -> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
                at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                at java.nio.file.Files.move(Files.java:1395)
                at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
                ... 17 more
[2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving replicas in dir /kafka/logs (kafka.server.ReplicaManager)
[2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs (kafka.log.LogManager)
[2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in /kafka/logs have failed (kafka.log.LogManager)

推荐答案

如果您没有更改 log.retention.byteslog.retention.hours>log.retention.minuteslog.retention.ms 配置,Kafka 尝试在 7 天后删除日志.因此,基于异常,Kafka 想要清理文件 /kafka/logs/__transaction_state-11/00000000000000807727.log 但是,Kafka 日志目录中没有这样的文件并且它抛出一个异常导致 broker关闭.

if you have not changed log.retention.bytes or log.retention.hours or log.retention.minutes or log.retention.ms configs, Kafka tries to delete logs after 7 days. So based on the exception, Kafka wants to clean up file /kafka/logs/__transaction_state-11/00000000000000807727.log but, there is no such file in Kafka log directory and it throws an exception which causes broker shut down.

如果您能够关闭集群,Zookeeper 可以这样做并手动清理 /kafka/logs/__transaction_state-11.

if you are able to shut down cluster and Zookeeper do it and clean up /kafka/logs/__transaction_state-11 manually.

注意:我不知道它是否有害,但您可以按照安全删除 Kafka 主题帖子.

Note: I don't know it is harmful or not but you can follow safely remove Kafka topic posts.

这篇关于清理日志文件时 Kafka 代理关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆