我怎么知道 nodetool 修复是否完成 [英] how do i know if nodetool repair is finished

查看:19
本文介绍了我怎么知道 nodetool 修复是否完成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 2 节点的 apache cassandra (2.0.3) 集群,代表因子为 1.我在 cqlsh 中使用以下命令将代表因子更改为 2

I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I change rep factor to 2 using the following command in cqlsh

ALTER KEYSPACE "mykeyspace" WITH REPLICATION =   { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

然后我尝试在执行此类更改后运行推荐的nodetool repair".

I then tried to run recommended "nodetool repair" after doing this type of alter.

问题是这个命令有时会很快完成.当它像这样完成时,它通常会说丢失通知..."并且退出代码不为零.

The problem is that this command sometimes finishes very quickly. When it does finishes like that it will normally say 'Lost notification...' and exit code is not zero.

所以我只是重复这个'nodetool repair',直到它没有错误地完成.我还检查了nodetool status"是否报告了每个节点的预期磁盘空间.(rep 系数为 1,每个节点都说每个节点大约 7GB,我希望在 nodetool 修复后每个节点是 14GB,假设同时没有集群使用)

So I just repeat this 'nodetool repair' until it finishes without error. I also check that 'nodetool status' reports expected disk space for each node. (with rep factor 1, each node has say about 7GB each and I expect after nodetool repair that each is 14GB each assuming no cluster usage in the mean time)

在这种情况下,是否有更正确的方法来确定nodetool repair"已完成?

Is there a more correct way to determine that 'nodetool repair' is finished in this case?

推荐答案

一般来说,您可以使用两个 nodetool 命令来监控 nodetool repair 操作:

Generally speaking, you can monitor a nodetool repair operation with two nodetool commands:

  • 压缩统计
  • 网络统计

修复操作有两个不同的阶段.首先它计算节点之间的差异(要完成的修复工作),然后通过将数据流式传输到适当的节点来处理这些差异.

The repair operation has two distinct phases. First it calculates the differences between the nodes (repair work to be done), and then it acts on those differences by streaming data to the appropriate nodes.

这会检查活动的默克尔树计算:

This checks on the active Merkle Tree calculations:

$ nodetool compactionstats
pending tasks: 0
Active compaction remaining time :        n/a

可以通过以下方式监控修复流:

The repair streams can be monitored by:

$ nodetool netstats

事实上,TheLastPickle 的 Aaron Morton 建议使用以下 Bash 脚本/命令来监控任何活动的修复流:

In fact, TheLastPickle's Aaron Morton suggests using the following Bash script/command to monitor any active repair streams:

while true; do date; diff <(nodetool -h localhost netstats) <(sleep 5 && nodetool -h localhost netstats); done

DataStax 在他们的支持论坛上发布了关于故障排除的帖子.如果您有任何挂起的修复流,您应该能够通过 netstats 看到它们.如果您的节点之一在修复过程中变得不可用,就会发生这种情况.要监控特定的修复操作,您可以检查日志文件中的条目,如下所示:

DataStax has a posting in their support forums about troubleshooting hanging repairs. If you have any hung repair streams, you should be able to see them with a netstats. This can happen if one of your nodes becomes unavailable during the repair process. To monitor the specific repair operations, you can check your log file for entries like this:

DEBUG [WRITE-/172.30.77.197] 2013-05-03 12:43:09,107 OutboundTcpConnection.java(第 165 行)错误写入/172.30.77.197java.net.SocketException: 连接重置

DEBUG [WRITE-/172.30.77.197] 2013-05-03 12:43:09,107 OutboundTcpConnection.java (line 165) error writing to /172.30.77.197 java.net.SocketException: Connection reset

请注意,修复会话也应在您的 system.log 中注明:

Note that repair sessions should also be denoted in your system.log:

[repair #02fc68f0-210c-11e7-aa88-c35a9a02c19a] Starting...

[repair #02fc68f0-210c-11e7-aa88-c35a9a02c19a] Completed...

这篇关于我怎么知道 nodetool 修复是否完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆