Cassandra节点上的高负载 [英] High load on Cassandra nodes

查看:230
本文介绍了Cassandra节点上的高负载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于某种原因,我在Cassandra节点上遇到了高负载.这是获取图片的一些信息.

For some reason I experience high load on my Cassandra nodes. Here are some information to get the picture.

  • 当我创建一个全新的集群时,负载会持续几天不出现负载,并且会随着时间的推移而增加,经过一周或其他时间之后,负载会逐渐增加,导致我发现负载不稳定整个集群

  • When I create a brand new Cluster the load is constantly low for a couple of days and increases by time, after a week or something it just goes of into the air, causing what I found is a instability in the whole Cluster

我正在对我的一个键空间进行快照,该键空间每4小时包含大约300-400 MB的数据,并删除7天之前的数据,所有这些都在OpsCenter中配置

I'm taking snapshots of one of my keyspaces containing around 300-400 MBs of data every 4 hour and deleting the ones older than 7 days, all configured in OpsCenter

群集正在Microsoft Azure的条带化磁盘上运行

The cluster is running on the striped disks in Microsoft Azure

节点在具有3.5 GB RAM的2个内核上运行,我很清楚这低于推荐的硬件,但这不是造成高负载的原因,我尝试在4个内核上运行具有7 GB的RAM并没有差异

The nodes are running on 2 cores with 3.5 GBs of RAM, I'm well aware that this is lower than the recommended hardware but this should not be the cause for the high load, I tried running on 4 cores with 7 GBs of RAM and saw no difference

我确定可能有一整箱东西可能会导致高负载,但是我想某些东西比其他东西更有可能.

I'm sure there's probably a whole box of things that could cause high load but I guess something is more likely than something else.

此高负载似乎是由OpsCenter中的维修服务引起的.必须有一些设置来调整服务如何运行修复.

It appears that this high load is caused by the Repair Service in OpsCenter. There must be some settings to tweak how the repairing are runned by the service.

推荐答案

您可以通过在opscenterd.conf中添加[repair_service]部分来配置修复服务.

You can configure the repair service by adding a [repair_service] section to your opscenterd.conf.

调整的主要杠杆是:

max_parallel_repairs = 0  

您可以增加此数量,直到维修工作完成得足够快,以至于您可以在所需的时间段内完成维修(< gc_grace_seconds)

You can increase this until your repairs are completing fast enough that they are done within the time period you require (< gc_grace_seconds)

min_repair_time = 5

如果您没有那么多数据,则修复服务可能会完成得太快,然后重新启动-造成不必要的开销.您可以增加此值以确保您不会太频繁地进行维修

If you don't have that much data, the repair service may be completing too quickly and restarting -- causing unnecessary overhead. You can increase this value to ensure that you aren't running repair too frequently

snapshot_override

同样,如果您没有太多的数据并且修复服务完成得太快,则会生成太多快照(默认情况下,修复服务会在每次修复之前拍摄快照).如果快照目录的填充速度非常快,则可以将其关闭,直到您将服务调整为仅运行一次(使用raise min_repair_time drop parallel_repairs).

Again if you don't have too much data and the repair service completes too quickly, you will be generating too many snapshots (by default, repair service takes a snapshot before every repair). If your snapshot directory is getting full extremely quickly, you may want to turn this off until you tune the service to only run once (use raise min_repair_time drop parallel_repairs).

注意:维修服务的重点是将昂贵/资源消耗大的维修过程分散到较小的工作中,这意味着您可以将总体CPU使用率提高5%或10%始终保持正常运行,而不是在日常维修运行中造成峰值并影响您的工作量.

Note: The point of the repair service is to spread out the expensive/resource consuming process of repair into smaller jobs, this means that you may increase your overall cpu utilization by 5% or 10% at all times rather than having it spike and affect your workload during regular repair runs.

有关高级配置

这篇关于Cassandra节点上的高负载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆