FATAL master.HMaster:意外状态:..无法将其传递到OFFLINE [英] FATAL master.HMaster: Unexpected state : .. Cannot transit it to OFFLINE

查看:183
本文介绍了FATAL master.HMaster:意外状态:..无法将其传递到OFFLINE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个严重的Hbase崩溃问题。我使用HBase 0.94.7和一个主服务器和两个区域服务器。 HBase大师经常崩溃,我甚至无法重启。我有如下的主日志:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ DEBUG master.AssignmentManager:处理transition = RS_ZK_REGION_CLOSED,server = master,60020, 1374506461230,region = 46c2333f401964bf877254be19c2cc8c
DEBUG handler.ClosedRegionHandler:处理CLOSED事件6423df864603aa6e8c45c726​​ab3ae62f
DEBUG master.AssignmentManager:强制OFFLINE;是=设置logdetail,\x00\x00\x01\xE8\x00\x00\x01?\xF8\xB3\x8F\x17\xCE\xE2g\x84,1374498065657.6423df864603aa6e8c45c726​​ab3ae62f 。 state = CLOSED,ts = 1374508769672,server = slave,60020,1374506460892
DEBUG zookeeper.ZKAssign:master:60000-0x14006f52f3f000e使用OFFLINE状态创建(或更新)6423df864603aa6e8c45c726​​ab3ae62f的未分配节点
FATAL master.HMaster:意外状态:LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p& c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d。 state = PENDING_OPEN,ts = 1374508769697,server = master,60020,1374506461230 ..无法将其传递到OFFLINE。
java.lang.IllegalStateException:意外状态:LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p& c\x8F\ x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d。 state = PENDING_OPEN,ts = 1374508769697,server = master,60020,1374506461230 ..无法将其传递到OFFLINE。
at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at java.util.concurrent.ThreadPoolExecutor $ Worker.runTask(ThreadPoolExecutor.java:886)
在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:908)$ b $在java.lang.Thread.run(Thread.java:662)
INFO master.HMaster :中止
DEBUG handler.ClosedRegionHa ndler:处理CLOSED事件0710b486dcb3d51465695b51db376255

....

  DEBUG master.AssignmentManager:区域LogDetail的znode,\x00\x00\x01\xE8\x00\x00\x01?\xF6 \xC17p&安培; c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d。已被删除。 
INFO master.AssignmentManager:主打开区域LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p& c\x8F \x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d。 60020,1374506461230
DEBUG master.AssignmentManager:处理transition = M_ZK_REGION_OFFLINE,server = master,60000,1374508461536,region = c9cfdd360c09b292412ba5ad88815e6f
DEBUG catalog.CatalogTracker:停止目录跟踪器org.apache。 hadoop.hbase.catalog.CatalogTracker@5c061cd2
INFO client.HConnectionManager $ HConnectionImplementation:关闭zookeeper sessionid = 0x14006f52f3f000f
INFO zookeeper.ZooKeeper:Session:0x14006f52f3f000f关闭
INFO zookeeper.ClientCnxn:EventThread关闭
INFO master.AssignmentManager $ TimerUpdater:master,60000,1374508461536.timerUpdater退出
INFO master.SplitLogManager $ TimeoutMonitor:master,60000,1374508461536.splitLogManagerTimeoutMonitor退出
INFO master.AssignmentManager $ TimeoutMonitor:master ,60000,1374508461536.timeoutMonitor退出
INFO zookeeper.ZooKeeper:Session:0x14006f52f3f000e关闭
INFO zookeeper.ClientCnxn:EventThread关闭
INFO master.HMaster:H主主线程退出
错误master.HMasterCommandLine:无法启动主

我也发现了一些东西在ZK日志中不寻常:

  INFO org.apache.zookeeper.server.NIOServerCnxnFactory:从/ master接受的套接字连接:37856 
INFO org.apache.zookeeper.server.ZooKeeperServer:尝试在/ master上建立新会话的客户端:37856
INFO org.apache.zookeeper.server.ZooKeeperServer:建立会话0x140100dda0300e1,协商超时值为180000,用于客户端/ master:37856
WARN org.apache.zookeeper.server.NIOServerCnxn:捕获流异常结束
EndOfStreamException:无法从客户端sessionid 0x140100dda0300e1读取其他数据,可能客户端在org时关闭了套接字
.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread .RUN(T hread.java:662)
INFO org.apache.zookeeper.server.NIOServerCnxn:客户端/主服务器关闭的套接字连接:37856其中sessionid 0x140100dda0300e1

有人可以帮忙看看问题是什么吗?它与未分配区域或类似的东西有关吗?我试过了 bin / hbase hbck -repair bin / hbase hbck -fix ,但它没有帮帮我。



谢谢

解决方案

仔细地,我得到了答案。

原因



事实证明,那里是一个名为'SNAPPY'的库,因为hbase表的压缩没有很好地安装在区域服务器上。我所有的表格都是使用这种压缩算法创建的。当主服务器尝试将区域与区域服务器进行平衡时,失败。最终主人流产了。


$ b

解决方案



每个节点上安装并配置SNAPPY ,如下所示:

  apt-get install libsnappy1 
su hbase
mkdir / home / hbase /hbase-0.94.7/lib/native/Linux-amd64-64
ln -s /usr/lib/libsnappy.so.1.1.2 /home/hbase/hbase-0.94.7/lib/native/ Linux-amd64-64 / libsnappy.so
exit( - > root)
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1.1。 2
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1
ln -s /usr/lib/libsnappy.so.1.1.2 / usr / lib64 / libsnappy.so
ln -s /usr/lib/libsnappy.so.1 /usr/lib/libsnappy.so

现在一切正常!这些区域在区域服务器之间很好地平衡。

I've got a serious Hbase crash problem. I'm using HBase 0.94.7 with one master and two region servers. The HBase master crashed regularly, I can't even get it restarted. I've got the master logs as following:

DEBUG master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=master,60020,1374506461230, region=46c2333f401964bf877254be19c2cc8c
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 6423df864603aa6e8c45c726ab3ae62f
DEBUG master.AssignmentManager: Forcing OFFLINE; was=LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF8\xB3\x8F\x17\xCE\xE2g\x84,1374498065657.6423df864603aa6e8c45c726ab3ae62f. state=CLOSED, ts=1374508769672, server=slave,60020,1374506460892
DEBUG zookeeper.ZKAssign: master:60000-0x14006f52f3f000e Creating (or updating) unassigned node for 6423df864603aa6e8c45c726ab3ae62f with OFFLINE state
FATAL master.HMaster: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
    at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
    at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
INFO master.HMaster: Aborting
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 0710b486dcb3d51465695b51db376255

....

DEBUG master.AssignmentManager: The znode of region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. has been deleted.
INFO master.AssignmentManager: The master has opened the region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. that was online on master,60020,1374506461230
DEBUG master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=master,60000,1374508461536, region=c9cfdd360c09b292412ba5ad88815e6f
DEBUG catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@5c061cd2
INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x14006f52f3f000f
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000f closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.AssignmentManager$TimerUpdater: master,60000,1374508461536.timerUpdater exiting
INFO master.SplitLogManager$TimeoutMonitor: master,60000,1374508461536.splitLogManagerTimeoutMonitor exiting
INFO master.AssignmentManager$TimeoutMonitor: master,60000,1374508461536.timeoutMonitor exiting
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000e closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.HMaster: HMaster main thread exiting
ERROR master.HMasterCommandLine: Failed to start master

I also found something unusual in the ZK log:

INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140100dda0300e1 with negotiated timeout 180000 for client /master:37856
WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x140100dda0300e1, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:662)
INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /master:37856 which had sessionid 0x140100dda0300e1

Can anybody help to see what the problem is? Is it related to the unassigned region or something like this? I've tried the bin/hbase hbck -repair and bin/hbase hbck -fix, but it doesn't help.

Thanks

解决方案

After checked the log of my region server very carefully, I got the answer.

Cause

It turns out that there is one library called 'SNAPPY' for the compression of the hbase table is not well installed on the region server. And all my tables are created using this compression algorithm. When the master tries to balance the region to the region server, it failed. Eventually the master aborted.

Solution

Install and configure the SNAPPY on EVERY NODE as following:

apt-get install libsnappy1
su hbase
mkdir /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64
ln -s /usr/lib/libsnappy.so.1.1.2 /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64/libsnappy.so
exit (-> root)
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1.1.2
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so
ln -s /usr/lib/libsnappy.so.1 /usr/lib/libsnappy.so

Now everything is OK! The regions are well balanced over region servers.

这篇关于FATAL master.HMaster:意外状态:..无法将其传递到OFFLINE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆