服务器崩溃后如何从 java.io.EOFException 恢复 Zookeeper? [英] How to recover Zookeeper from java.io.EOFException after a server crash?
问题描述
如何从服务器崩溃后开始发生的以下错误中恢复?Zookeeper 无法启动,并且日志中重复显示以下消息.
How to recover from the following error that started happening after a server crash? Zookeeper won’t start and the following message is showing repeatedly on the log.
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA>
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:os.version=3.10.0-514.16.1.el7.x86_64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/opt/zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@829] - tickTime set to 2000
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@838] - minSessionTimeout set to -1
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@847] - maxSessionTimeout set to -1
2017-05-27 01:02:08,080 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-05-27 01:02:08,385 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,400 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
谢谢IPVP
推荐答案
您似乎遇到了一个已知的 Apache ZooKeeper 错误.有几个不同的 Apache JIRA 问题与此相关:ZOOKEEPER-1621 和 ZOOKEEPER-2332.如果您对根本原因分析和一些潜在的建议修复感兴趣,请查看这些问题中的评论.
It looks like you have encountered a known Apache ZooKeeper bug. There are a few different Apache JIRA issues related to this: ZOOKEEPER-1621 and ZOOKEEPER-2332. See the comments in those issues if you're interested in root cause analysis and some potential proposed fixes.
不幸的是,目前没有包含修复该错误的 Apache ZooKeeper 版本.您可以尝试一些潜在的解决方法:
Unfortunately, there is no Apache ZooKeeper release that contains a fix for the bug at this time. There are a few potential workarounds that you could try:
- 使用附加到链接的 JIRA 问题的补丁之一创建您自己的 ZooKeeper 本地版本.请注意,这些补丁尚未被 ZooKeeper 社区接受,因此使用风险自负.
- 删除有问题的日志文件.问题的根本原因是先前运行 ZooKeeper 的日志文件写入的标头不完整.由于头在文件的开头,而头本身是不完整的,我们可以假设在那之后日志文件中没有事务数据.因此,删除应该是安全的,不会造成任何数据丢失.
- 如果更简单,您可以考虑重新格式化这个 ZooKeeper 集群.如果 ZooKeeper 安装中的所有数据都是短暂的并且不需要长期持久性,这可能是一个合适的解决方案.
这篇关于服务器崩溃后如何从 java.io.EOFException 恢复 Zookeeper?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!