ZooKeeper不断获取EndOfStreamException,从而导致崩溃 [英] ZooKeeper keeps getting EndOfStreamException, causing a crash

查看:2022
本文介绍了ZooKeeper不断获取EndOfStreamException,从而导致崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Zookeeper通过在每个节点中保存相关的作业数据,直到计算机准备好处理为止,从而为不同的作业控制几个不同的队列. 如果停止整体服务,则无法启动任何作业,ZooKeeper在重新启动后即可正常运行.但是,其中某些作业似乎会导致ZooKeeper崩溃,并在ZooKeeper日志中显示以下消息:

My Zookeeper is controlling a few different queues for different jobs, by holding the relevant job data in each node until the computer is ready to process. If I stop the overall service, such that no jobs can be started ZooKeeper runs just fine after a restart. However, some of these jobs seem to cause ZooKeeper to crash with the following message in the ZooKeeper log:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x15677f740ad002a, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:46998 which had sessionid 0x15677f740ad002a

我的ZooKeeper知识非常有限,因为我是从最初设置它的人那里接任的.

My ZooKeeper knowledge is very limited, as I am taking over from the guy that set it up originally.

我试图在Zookeeper外壳中删除很多带有rmr [path]的节点,这似乎有一定效果(删除了50k +个未使用/没有用的节点),但是它每天都崩溃,最后一次崩溃晚上,我无法让它运行超过几分钟,然后才会发生相同的错误/崩溃.

I have tried to delete a lot of nodes with rmr [path] in the zookeeper shell, which seemed to have some effect (deleted 50k+ nodes that was left over/of no use), but it has kept crashing daily, and last night I couldn't get it to run for more than a couple of minutes before the same error/crash would occur.

我如何找出导致这种情况的原因?

我很确定这是接收到的数据或存储的数据/节点的一些普遍问题.磁盘仅占92%. 我还发现了这篇文章: Zookeeper不断收到警告: 捕获流异常结束" ,但该解决方案对我而言意义不大.另外,我很确定znodes中保存的所有消息都不会超过1MB,但是我不确定如何确认这一点.

I am pretty sure it is some general problem with the data that is recieved, or the stored data/nodes. The disk is only 92% full. I also found this post: Zookeeper keeps getting the WARN: "caught end of stream exception", but the solution doesn't make much sense to me. Also I am pretty sure that none of the messages kept in my znodes are more than 1MB large, but I am unsure how to confirm this.

是否可以通过某种方式更改ZooKeeper日志,以便打印其他信息,例如在崩溃之前运行的znode的内容/名称?

推荐答案

我能够通过从运行ZooKeeper的服务器上删除所有Zookeeper快照和日志文件来解决此问题.我不知道为什么会有所不同,但过去22个小时一直运行良好.

I was able to solve the problem by deleting all zookeeper snapshots and log files from the server running ZooKeeper. I don't know why this made a difference, but it has been running fine for the last 22 hours.

这篇关于ZooKeeper不断获取EndOfStreamException,从而导致崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆