Kafka代理在清理日志文件时关闭 [英] Kafka broker shutdown while cleaning up log files
问题描述
Kafka集群具有3个代理(版本:1.1.0),并且运行了6个月以上.
Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 months.
然后我们在2018/12/12之后将每个主题的分区从3修改为48,然后经纪人每5-10天关闭一次.
Then we modified partitions from 3 to 48 for every topic after 2018/12/12, then the brokers shutdown every 5-10 days.
然后,我们将经纪人从1.1.0升级到2.1.0,但是经纪人仍然每5-10天关闭一次.
Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep shutting down every 5-10 days.
每次,一个代理在以下错误日志之后关闭,然后几分钟,其他2个代理也关闭,除了其他分区日志文件之外,同样的错误.
Each time, one broker shut down after the following error log, then several minutes later, the other 2 brokers shut down too, with the same error but other partition log files.
[2019-01-11 17:16:36,572] INFO [ProducerStateManager partition=__transaction_state-11] Writing producer snapshot at offset 807760 (kafka.log.ProducerStateManager)
[2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. (kafka.log.Log)
[2019-01-11 17:16:46,150] WARN Resetting first dirty offset of __transaction_state-35 to log start offset 194404 since the checkpointed offset 194345 is invalid. (kafka.log.LogCleanerManager$)
[2019-01-11 17:16:46,239] ERROR Failed to clean up log for __transaction_state-11 in dir /kafka/logs due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Log.replaceSegments(Log.scala:1896)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
at kafka.log.Cleaner.clean(LogCleaner.scala:492)
at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log -> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
... 17 more
[2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving replicas in dir /kafka/logs (kafka.server.ReplicaManager)
[2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs (kafka.log.LogManager)
[2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in /kafka/logs have failed (kafka.log.LogManager)
推荐答案
,如果您尚未更改 log.retention.bytes
或 log.retention.hours
或 log.retention.minutes
或 log.retention.ms
配置,Kafka会尝试在7天后删除日志.因此,基于该异常,Kafka希望清理文件/kafka/logs/__ transaction_state-11/00000000000000807727.log
,但是,Kafka日志目录中没有此类文件,并且抛出了一个异常,该异常导致代理关闭.
if you have not changed log.retention.bytes
or log.retention.hours
or log.retention.minutes
or log.retention.ms
configs, Kafka tries to delete logs after 7 days. So based on the exception, Kafka wants to clean up file /kafka/logs/__transaction_state-11/00000000000000807727.log
but, there is no such file in Kafka log directory and it throws an exception which causes broker shut down.
如果您能够关闭集群,而Zookeeper可以这样做,并手动清理/kafka/logs/__ transaction_state-11
.
if you are able to shut down cluster and Zookeeper do it and clean up /kafka/logs/__transaction_state-11
manually.
注意:我不知道这是否有害,但是您可以安全删除Kafka主题帖子.
Note: I don't know it is harmful or not but you can follow safely remove Kafka topic posts.
这篇关于Kafka代理在清理日志文件时关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!