Cassandra节点需要几个小时才能加入 [英] Cassandra node is taking hours to join
问题描述
我的大小为2的群集进入了某种不一致的状态。在一个节点(称为节点A)上,nodetool的状态正确显示了2个节点。在另一个节点(称为B)上时,它仅显示一个,即本身。经过几次尝试,我无法解决此问题。因此,我停用了节点B。但是,节点A上的nodetool状态仍显示节点B处于UN状态。我不得不在节点A上重新启动cassandra,以至于忘记了节点B。
My cluster of size 2 had entered into somewhat inconsistent state. On one node (call it node A) nodetool status was correctly showing 2 nodes. While on another node (call it B) it was showing only one i.e. itself. After several attempts I could not fixed the issue. So I decommissioned node B. But nodetool status on node A was still showing the node B that to in UN state. I had to restart cassandra on node A so that it forget node B.
但是,这导致了另一个问题。我正在创建新节点(称为C)以加入节点A的群集。但是该节点要花费几个小时。已经六个小时了,我想知道它是否最终将成功加入。
But this has lead to another problem. I am making new node (call it C) to join the cluster of node A. But that node is taking hours. It's already six hours and I am wondering whether it will successfully join finally.
查看节点C的调试日志,表明节点B(已停用的节点)正在引起麻烦。节点C上的日志不断显示:
Looking at debug logs of node C suggest that node B (the decommissioned one) is causing trouble. Logs at node C are constantly showing:
DEBUG [GossipTasks:1] 2017-04-29 12:38:40,004 Gossiper.java:337 - Convicting /10.120.8.53 with status removed - alive false
节点A上的Nodetool状态为
Nodetool status on node A is showing the node C in joining state as expected.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UJ 10.120.8.113 1006.97 MiB 256 ? f357d8d0-2379-43d8-8ae5-62224191fb6c rack1
UN 10.120.8.23 5.29 GiB 256 ? 596260a0-785a-435c-a3f3-632f56c5c882 rack1
几个小时后,节点C的负载分数增加
Load for node C increases in fraction after couple of hours.
我检查了system.peers是否包含节点B。但是表中包含零行。
I checked whether system.peers contains node B. But the table contains zero rows.
我是使用cassandra 3.7。
I am using cassandra 3.7.
出了什么问题。我该怎么做才能避免丢失节点A上的数据并仍然扩展群集?
What's going wrong. What can I do to avoid losing data on node A and still scale the cluster?
推荐答案
运行 nodetool netstats 在节点C上查看 strong>,看看是否有进展。
还请查看 nodetool compactionstats ,查看未处理的压缩量,以及它是否随着时间而减少。
Run nodetool netstats on node C and see if there's is a progress going on. Also review nodetool compactionstats, see amount of compactions pending, and see if it goes down with time.
如果引导失败,请尝试重新启动节点。
If the bootstraping failed, try restarting the node.
作为替代方案,您可以删除节点C并再次添加它,并将 auto_bootstrap 设置设置为false。节点启动后,请运行nodetool重建,并在此过程之后进行nodetool修复-应该是常规引导程序的更快选择。
As an alternative, you can remove node C and add it once again, with auto_bootstrap setting set to false. After the node is up, run nodetool rebuild, and nodetool repair after the process - should be a faster alternative for regular bootstrap.
这篇关于Cassandra节点需要几个小时才能加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!