Storm - 主管在重启时崩溃 [英] Storm - Supervisors crashing on reboot

查看:23
本文介绍了Storm - 主管在重启时崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个让我发疯的问题.我的本地 LAN 上运行着一台机器 Storm 实例.我目前正在运行 v0.9.1-incubating 发布版本(来自 Apache Incubator 站点.问题只是我的 storm supervisor 进程拒绝启动 EVERY SINGLE 重启后.hack 修复非常简单,从storm本地目录中删除supervisorworkers 文件夹并重新运行该过程; 事情运行 hunky dory 然后直到下次重新启动.

This is an issue that is simply driving me nuts. I have a one machine Storm instance running on my Local LAN. I am currently running v0.9.1-incubating release version (from the Apache Incubator site. The issue is simply that my storm supervisor process refuses to start after EVERY SINGLE reboot. The hack fix is quite simple, remove the supervisor and workers folders from the storm local directory and re run the process; things run hunky dory then on until next reboot.

我提供了我认为可能与调试此问题相关的所有信息.如果需要,请询问更多,但请帮助我解决问题.

I'm providing every bit of information I think might be relevant to debug this issue. Please ask for more if needed, but just help me get some resolution.

PS:拓扑是否在运行并不重要.

PS: It doesn't matter if I have topologies running or not.

  1. Zookeeper 版本:3.4.5
  2. Storm 版本:0.9.1-incubating(使用 Netty 传输)
  3. Storm 和 Zookeeper 在同一台机器上运行.
  4. 主管版本:3.0b2
  5. 操作系统:Ubuntu 12.04 LTS
  6. 处理器:AMD Phenom(tm) II X6 1055T 处理器 × 6
  7. 内存:5.6 GiB

主管配置

[program:zookeeper]
command=/path/to/zookeeper/bin/zkServer.sh "start-foreground"
process_name=zookeeper
directory=/path/to/zookeeper/bin
stdout_logfile=/var/log/zookeeper.log        ; stdout log path, NONE$
stderr_logfile=/var/log/err.zookeeper.log        ; stderr log path, $
priority=2
user=root


[program:storm-nimbus]
command=/path/to/storm/bin/storm nimbus
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/nimbus.err.log
stdout_logfile=/var/log/storm/nimbus.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=10


[program:storm-ui]
command=/path/to/storm/bin/storm ui
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/ui.err.log
stdout_logfile=/var/log/storm/ui.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=500


[program:storm-supervisor]
command=/path/to/storm/bin/storm supervisor
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/supervisor.err.log
stdout_logfile=/var/log/storm/supervisor.log.log
logfile_maxbytes=20MB
logfile_backups=2
priority=600


[program:storm-logviewer]
command=/path/to/storm/bin/storm logviewer
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/log.err.log
stdout_logfile=/var/log/storm/log.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=900

风暴配置

#Zookeeper
storm.zookeeper.servers:
     - "192.168.1.11"

# Nimbus
nimbus.host: "192.168.1.11"
nimbus.childopts: '-Xmx1024m -Djava.net.preferIPv4Stack=true -Dprocess=storm'

# UI
ui.port: 9090
ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm"

# Supervisor
supervisor.childopts: '-Djava.net.preferIPv4Stack=true -Dprocess=storm'


# Worker
worker.childopts: '-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm'

storm.local.dir: "/path/to/storm"

storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100

错误信息
Pastebin 用于记录错误消息.我在这里交叉发布相关位.

Error message
Pastebin for log error message. I'm cross posting the relevant bits here.

java.lang.RuntimeException: java.io.EOFException
    at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
    at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na]
    at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
    at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
    at java.lang.Thread.run(Thread.java:679) [na:1.6.0_27]
Caused by: java.io.EOFException: null
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2322) ~[na:1.6.0_27]
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2791) ~[na:1.6.0_27]
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798) ~[na:1.6.0_27]
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298) ~[na:1.6.0_27]
    at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
    ... 11 common frames omitted
2014-03-11 12:27:25 b.s.util [INFO] Halting process: ("Error when processing an event")

推荐答案

当我们的 2 个开发服务器断电时,我们遇到了完全相同的问题(管理程序在启动时崩溃并出现相同的日志错误消息).我想只是在不事先停止主管的情况下停止服务器会产生相同的效果.

We had that exact same problem (supervisor crashing on start and same log error message) when we had a power outage on 2 of our development servers. I guess just stopping the server without previously stopping the supervisor would have the same effect.

我们找到的唯一可行的解​​决方案是删除storm-local/supervisor"文件夹(我猜里面的东西已经损坏了).

The only working solution we found was to remove the "storm-local/supervisor" folder (I guess something in there got corrupted).

这篇关于Storm - 主管在重启时崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆