Zookeeper 不断收到警告:“捕获流异常结束"; [英] Zookeeper keeps getting the WARN: "caught end of stream exception"

查看:68
本文介绍了Zookeeper 不断收到警告:“捕获流异常结束";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在使用 CDH-5.3.1 集群,其中三个 Zookeeper 实例位于三个 ip:

I am now using a CDH-5.3.1 cluster with three zookeeper instances located in three ips:

133.0.127.40 n1
133.0.127.42 n2
133.0.127.44 n3

启动时一切正常,但最近我注意到节点 n2 不断收到警告:

Everything works fine when it starts, but these days I notice that the node n2 keeps getting the WARN:

caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid **0x0**, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:722)

它每秒发生一次,并且仅在 n2 上发生,而 n1 和 n3 很好.我仍然可以使用 HBase shell 来扫描我的表,并使用 Solr WEB UI 进行查询.但是我无法启动 Flume 代理,整个过程都在这一步停止:

it happens every second, and only on n2, while n1 and n3 are fine. I can still use HBase shell to scan my table, and the Solr WEB UI to do querys. But I cannot start Flume agents, the process all stops at this step:

Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog

jetty-6.1.26.cloudera.4

Started SelectChannelConnector@0.0.0.0:41414.

几分钟后,我从 Cloudera Manager 收到警告,说 Flume 代理超出了文件描述符的阈值.

And minutes later I get the warning from Cloudera Manager that Flume agent is exceeding the threshold of File Descriptors.

有谁知道出了什么问题?提前致谢.

Does anyone know what is going wrong? Thanks in advance.

推荐答案

我记得在 ZK 中看到过类似的错误(诚然不是 Flume).我认为当时的问题与存储在节点上和/或传输到客户端的大量数据有关.在 zoo.cfg 中需要考虑调整的事项:

I recall seeing similar errors in ZK (admittedly not with Flume). I believe the problem at the time was to do with the large amount of data stored on the node and/or transferred to the client. Things to consider tweaking in zoo.cfg:

  • 限制 autopurge.snapRetainCount,例如将其设置为 10
  • autopurge.purgeInterval 设置为 2(小时)
  • put a limit on autopurge.snapRetainCount, e.g. set it to 10
  • set autopurge.purgeInterval to, say, 2 (hours)

如果 ZK 客户端(Flume?)正在向/从 ZK 集群流式传输大型 znode,您可能需要设置 Java 系统属性 jute.maxbuffer客户端 JVM(s),可能在服务器节点上,达到足够大的值.我相信这个属性的默认值是 1M.为您的工作量确定合适的值恐怕是一种反复试验的练习!

If the ZK client (Flume?) is streaming large znodes to/from the ZK cluster, you may want to set the Java system property jute.maxbuffer on the client JVM(s), and possibly on the server nodes, to a large enough value. I believe the default value for this property is 1M. Determining the appropriate value for your workload is an exercise in trial and error I'm afraid!

这篇关于Zookeeper 不断收到警告:“捕获流异常结束";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆