Cassandra节点需要几个小时才能加入 [英] Cassandra node is taking hours to join

查看:173
本文介绍了Cassandra节点需要几个小时才能加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的大小为2的群集进入了某种不一致的状态。在一个节点(称为节点A)上,nodetool的状态正确显示了2个节点。在另一个节点(称为B)上时,它仅显示一个,即本身。经过几次尝试,我无法解决此问题。因此,我停用了节点B。但是,节点A上的nodetool状态仍显示节点B处于UN状态。我不得不在节点A上重新启动cassandra,以至于忘记了节点B。

My cluster of size 2 had entered into somewhat inconsistent state. On one node (call it node A) nodetool status was correctly showing 2 nodes. While on another node (call it B) it was showing only one i.e. itself. After several attempts I could not fixed the issue. So I decommissioned node B. But nodetool status on node A was still showing the node B that to in UN state. I had to restart cassandra on node A so that it forget node B.

但是,这导致了另一个问题。我正在创建新节点(称为C)以加入节点A的群集。但是该节点要花费几个小时。已经六个小时了,我想知道它是否最终将成功加入。

But this has lead to another problem. I am making new node (call it C) to join the cluster of node A. But that node is taking hours. It's already six hours and I am wondering whether it will successfully join finally.

查看节点C的调试日志,表明节点B(已停用的节点)正在引起麻烦。节点C上的日志不断显示:

Looking at debug logs of node C suggest that node B (the decommissioned one) is causing trouble. Logs at node C are constantly showing:

DEBUG [GossipTasks:1] 2017-04-29 12:38:40,004 Gossiper.java:337 - Convicting /10.120.8.53 with status removed - alive false

节点A上的Nodetool状态为

Nodetool status on node A is showing the node C in joining state as expected.

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UJ  10.120.8.113  1006.97 MiB  256          ?       f357d8d0-2379-43d8-8ae5-62224191fb6c  rack1
UN  10.120.8.23   5.29 GiB   256          ?       596260a0-785a-435c-a3f3-632f56c5c882  rack1

几个小时后,节点C的负载分数增加

Load for node C increases in fraction after couple of hours.

我检查了system.peers是否包含节点B。但是表中包含零行。

I checked whether system.peers contains node B. But the table contains zero rows.

我是使用cassandra 3.7。

I am using cassandra 3.7.

出了什么问题。我该怎么做才能避免丢失节点A上的数据并仍然扩展群集?

What's going wrong. What can I do to avoid losing data on node A and still scale the cluster?

推荐答案

运行 nodetool netstats ,看看是否有进展。
还请查看 nodetool compactionstats ,查看未处理的压缩量,以及它是否随着时间而减少。

Run nodetool netstats on node C and see if there's is a progress going on. Also review nodetool compactionstats, see amount of compactions pending, and see if it goes down with time.

如果引导失败,请尝试重新启动节点。

If the bootstraping failed, try restarting the node.

作为替代方案,您可以删除节点C并再次添加它,并将 auto_bootstrap 设置设置为false。节点启动后,请运行nodetool重建,并在此过程之后进行nodetool修复-应该是常规引导程序的更快选择。

As an alternative, you can remove node C and add it once again, with auto_bootstrap setting set to false. After the node is up, run nodetool rebuild, and nodetool repair after the process - should be a faster alternative for regular bootstrap.

这篇关于Cassandra节点需要几个小时才能加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆