风暴-主管在重新启动时崩溃 [英] Storm - Supervisors crashing on reboot

查看:160
本文介绍了风暴-主管在重新启动时崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个让我发疯的问题.我的本地LAN上运行着一台计算机Storm实例.我当前正在运行v0.9.1-incubating发行版(来自

This is an issue that is simply driving me nuts. I have a one machine Storm instance running on my Local LAN. I am currently running v0.9.1-incubating release version (from the Apache Incubator site. The issue is simply that my storm supervisor process refuses to start after EVERY SINGLE reboot. The hack fix is quite simple, remove the supervisor and workers folders from the storm local directory and re run the process; things run hunky dory then on until next reboot.

我提供了我认为可能与调试此问题有关的所有信息.如果需要,请索取更多信息,但请帮助我解决一些问题.

I'm providing every bit of information I think might be relevant to debug this issue. Please ask for more if needed, but just help me get some resolution.

PS:是否运行拓扑都没关系.

PS: It doesn't matter if I have topologies running or not.

  1. Zookeeper版本:3.4.5
  2. 暴风雨版本:0.9.1孵化(使用Netty传输)
  3. Storm和Zookeeper都在同一台计算机上运行.
  4. supervisord版本:3.0b2
  5. 操作系统:Ubuntu 12.04 LTS
  6. 处理器:AMD Phenom(tm)II X6 1055T处理器×6
  7. RAM:5.6 GiB

主管配置

[program:zookeeper]
command=/path/to/zookeeper/bin/zkServer.sh "start-foreground"
process_name=zookeeper
directory=/path/to/zookeeper/bin
stdout_logfile=/var/log/zookeeper.log        ; stdout log path, NONE$
stderr_logfile=/var/log/err.zookeeper.log        ; stderr log path, $
priority=2
user=root


[program:storm-nimbus]
command=/path/to/storm/bin/storm nimbus
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/nimbus.err.log
stdout_logfile=/var/log/storm/nimbus.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=10


[program:storm-ui]
command=/path/to/storm/bin/storm ui
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/ui.err.log
stdout_logfile=/var/log/storm/ui.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=500


[program:storm-supervisor]
command=/path/to/storm/bin/storm supervisor
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/supervisor.err.log
stdout_logfile=/var/log/storm/supervisor.log.log
logfile_maxbytes=20MB
logfile_backups=2
priority=600


[program:storm-logviewer]
command=/path/to/storm/bin/storm logviewer
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/log.err.log
stdout_logfile=/var/log/storm/log.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=900

风暴配置

#Zookeeper
storm.zookeeper.servers:
     - "192.168.1.11"

# Nimbus
nimbus.host: "192.168.1.11"
nimbus.childopts: '-Xmx1024m -Djava.net.preferIPv4Stack=true -Dprocess=storm'

# UI
ui.port: 9090
ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm"

# Supervisor
supervisor.childopts: '-Djava.net.preferIPv4Stack=true -Dprocess=storm'


# Worker
worker.childopts: '-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm'

storm.local.dir: "/path/to/storm"

storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100

错误消息
用于日志错误消息的Pastebin .我在这里交叉张贴相关的内容.

Error message
Pastebin for log error message. I'm cross posting the relevant bits here.

java.lang.RuntimeException: java.io.EOFException
    at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
    at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na]
    at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
    at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
    at java.lang.Thread.run(Thread.java:679) [na:1.6.0_27]
Caused by: java.io.EOFException: null
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2322) ~[na:1.6.0_27]
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2791) ~[na:1.6.0_27]
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798) ~[na:1.6.0_27]
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298) ~[na:1.6.0_27]
    at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    ... 11 common frames omitted
2014-03-11 12:27:25 b.s.util [INFO] Halting process: ("Error when processing an event")

推荐答案

当我们的两台开发服务器断电时,我们遇到了完全相同的问题(主管在启动时崩溃,并且出现了相同的日志错误消息).我猜想只是停止服务器而不必先停止主管也将具有相同的效果.

We had that exact same problem (supervisor crashing on start and same log error message) when we had a power outage on 2 of our development servers. I guess just stopping the server without previously stopping the supervisor would have the same effect.

我们找到的唯一可行的解​​决方案是删除" storm-local/supervisor "文件夹(我猜那里的东西已损坏).

The only working solution we found was to remove the "storm-local/supervisor" folder (I guess something in there got corrupted).

这篇关于风暴-主管在重新启动时崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆