ZooKeeper 不断收到 EndOfStreamException,导致崩溃 [英] ZooKeeper keeps getting EndOfStreamException, causing a crash

查看:131
本文介绍了ZooKeeper 不断收到 EndOfStreamException,导致崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Zookeeper 正在为不同的作业控制几个不同的队列,通过在每个节点中保存相关的作业数据直到计算机准备好处理.如果我停止整个服务,则无法启动任何作业,ZooKeeper 在重新启动后运行得很好.但是,其中一些作业似乎会导致 ZooKeeper 崩溃,并在 ZooKeeper 日志中显示以下消息:

My Zookeeper is controlling a few different queues for different jobs, by holding the relevant job data in each node until the computer is ready to process. If I stop the overall service, such that no jobs can be started ZooKeeper runs just fine after a restart. However, some of these jobs seem to cause ZooKeeper to crash with the following message in the ZooKeeper log:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x15677f740ad002a, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:46998 which had sessionid 0x15677f740ad002a

我的 ZooKeeper 知识非常有限,因为我是从最初设置它的人那里接手的.

My ZooKeeper knowledge is very limited, as I am taking over from the guy that set it up originally.

我尝试在zookeeper shell中用rmr [path]删除很多节点,这似乎有一些效果(删除了剩下/没有用的50k+节点),但是它每天都在崩溃,昨晚我无法让它运行超过几分钟,然后才会发生同样的错误/崩溃.

I have tried to delete a lot of nodes with rmr [path] in the zookeeper shell, which seemed to have some effect (deleted 50k+ nodes that was left over/of no use), but it has kept crashing daily, and last night I couldn't get it to run for more than a couple of minutes before the same error/crash would occur.

我如何找出导致这种情况的原因?

我很确定这是收到的数据或存储的数据/节点的一些普遍问题.磁盘只有 92% 已满.我还发现了这篇文章:Zookeeper 不断收到警告:捕获流异常结束",但该解决方案对我来说没有多大意义.此外,我很确定我的 znode 中保存的消息都没有超过 1MB,但我不确定如何确认这一点.

I am pretty sure it is some general problem with the data that is recieved, or the stored data/nodes. The disk is only 92% full. I also found this post: Zookeeper keeps getting the WARN: "caught end of stream exception", but the solution doesn't make much sense to me. Also I am pretty sure that none of the messages kept in my znodes are more than 1MB large, but I am unsure how to confirm this.

有什么方法可以更改 ZooKeeper 日志,以便我可以打印其他信息,例如它在崩溃之前正在运行的 znode 的内容/名称?

推荐答案

通过从运行 ZooKeeper 的服务器中删除所有 zookeeper 快照和日志文件,我能够解决该问题.我不知道为什么这会有所作为,但它在过去 22 小时内一直运行良好.

I was able to solve the problem by deleting all zookeeper snapshots and log files from the server running ZooKeeper. I don't know why this made a difference, but it has been running fine for the last 22 hours.

这篇关于ZooKeeper 不断收到 EndOfStreamException,导致崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆