大量写入会导致Cassandra环不稳定 [英] Large writes cause instability in Cassandra ring

查看:146
本文介绍了大量写入会导致Cassandra环不稳定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将大量数据加载到10节点的Cassandra环中。



执行插入操作的脚本获得了大约4000次插入/秒,被阻止大概在
网络I / O上。我在一台机器上启动了其中的8个,吞吐量几乎线性地扩展了
。 (单个吞吐量略有下降,但
的数量比其他流程所补偿的更多。)



这很不错,但是,我仍然没有获得足够的吞吐量,因此我
在另外3个VM上启动了相同的设置。 (因此,有8个进程* 4个虚拟机)在
个第一个附加虚拟机之后,并且随着添加更多虚拟机
的频率和严重性增加,将发生以下情况:




  • 客户端开始接收超时错误。他们可以重试写操作,但是
    因为是分批处理,所以前进的过程几乎完全消除了

  • 环变得不稳定,并且节点开始将自己标记为下降。
    此外,不同的节点倾向于对谁倒台有不同的想法。脚本中止后,
    环不会恢复。 (我什至无法通过重新启动单个节点来解决
    的问题:我不得不重新启动
    整个环。)



向下变化。在我的上次运行中:




  • 4个节点完全死亡。 (Cassandra根本没有运行。)检查日志,似乎没有任何记录其死因的原因。

  • 第五,Cassandra正在运行。该节点上的 nodetool状态挂起。两个线程似乎处于某种无限循环中。 (他们稳定地使用100%的CPU。)日志中有一个 java.lang.OutOfMemoryError:Java堆空间



代码本质上是:

  def prepped_batch_insert(session,items,insert_query ,silent = False):

#插入次数的映射->为该数量的
#插入准备好的查询。
prepped_statements = {}

def get_prepped_statement(inserts):
如果插入了插入内容:
#我们已经为许多插入创建了准备好的查询,请使用
#它:
return prepped_statements [inserts]
其他:
#我们还没有为这么多的插入创建准备好的查询,因此
#现在这样做:
query = ['BEGIN UNLOGGED BATCH']
用于xrange(插入)中的idx:
query.append(insert_query.query)
query.append('APPLY BATCH;')
query ='\n'.join(query)
ps = session.prepare(query)
prepped_statements [inserts] = ps
return ps

def do_prepped_batch_insert(batch)
ps = get_prepped_statement(len(batch))

#生成准备好的查询的参数列表:
params = []
为idx,项目为枚举(批量):
for insert_query.keyorder中的k:
params.append(item [k])
#这样做。
session.execute(ps,params)

return inserter.insert_and_time(
项目,#数据生成器
do_prepped_batch_insert,#上面的函数
_WHAT_APPEARS_TO_BE_THE_OPTIMAL_CASSANDRA_BATCH_SIZE, #= 200
quiet = silent,

函数 insert_and_time 分成200个大小的批处理,调用上述函数,并对整个工具包和kaboodle进行计时。 (此代码对环有害。)



我们尝试了更多读取,因为(被告知)每秒20k次插入很慢(插入数据,我想以该速率插入...),并且卡桑德拉具备了高容量。



我的问题:


  1. 我的工作是否有异常之处?有什么问题吗?

  2. 我只是简单地对环进行DDoS操作吗?

  3. 我该如何调试出什么问题了?

  4. 错误的客户端,恕我直言,永远不能杀死服务器。 (并且上面的操作不是很严重。)我能做些什么来防止这种情况?

¹客户端似乎也慢慢泄漏了文件描述符。我认为这无关。 (我在群集和连接上都调用 .shutdown 。)查看驱动程序源,似乎有很多途径会导致异常导致泄漏。

解决方案

您的情况听起来很不寻常,但是我没有关于正在运行的硬件的任何详细信息。问题很可能是堆大小,然后是IO瓶颈。除非您在SSD上运行,否则CPU使用率应该不是问题。



1)如果您要一次性加载数据,然后寻找较小的一致数据流考虑使用批量加载工具。



2)在某种意义上,您试图以比您所使用的硬件可以处理的速度更快的速度加载数据。



3)您应该查看Cassandra系统日志中是否有类似试图刷新内存表以恢复空间之类的消息,这些消息表明堆已用完,它们将具有有关内存GC和其他信息的信息。在执行任务。为了进行实时监视,您还可以使用jconsole或visualvm通过JMX连接到Cassandra实例。查看这些内容时,很明显堆是否开始填满并且系统开始备份。大多数生产Cassandra实例的堆大小为8GB,其数量大于由于停止GC事件变得越来越普遍而带来的收益递减的数量。



您应该注意的另一件事是即将进行的压缩,这是Cassandra的主要IO瓶颈之一。如果这个数字无限增长,则意味着您的系统受到硬盘速度的限制,您可以通过增加计算机数量或升级到SSD来缓解压力。
使用

  nodetool compactionstats 


您可以用来监视所有这些的工具是 Datastax Opscenter 。使用此工具,您可以从一个地方轻松监视整个集群,并且社区版本是完全免费的。



我确实想知道是否还有其他问题,因为我定期对Amazon m1.large实例进行基准测试,发现其中大约10个实例可以支持40〜50k的写入次数/



4)正如Eric所指出的那样,像Cassandra这样的面向分布式性能的系统要保持可用性和高性能,同时又要保持其性能非常困难。防范客户行为的护栏。折衷是提高了速度,以便在发生写入时最小化对系统状态的检查。这样可以实现极快的写入速度,但是维护人员有责任正确配置和监视他们的系统。


I'm attempting to load a large amount of data into a 10-node Cassandra ring.

The script doing the inserts gets ~4000 inserts / s, blocked presumably on network I/O. I launch 8 of these on a single machine, and the throughput scales almost linearly. (The individual throughput goes down slightly, but is more than compensated for by the additional processes.)

This works decently, however, I'm still not getting enough throughput, so I launched the same setup on 3 more VMs. (Thus, 8 processes * 4 VM) After the first additional VM, and with increasing frequency and severity as further VM are added, the following occurs:

  • The clients start receiving Timeout errors. They can re-try their writes, but because they do so in batches, their forward progress is almost entirely eliminated.
  • The ring becomes unstable, and nodes start labelling themselves as "down". Further, different nodes tend to have different ideas of who is down. The ring doesn't recover when the scripts are aborted. (I've not even been able to fix this by just restarting individual nodes: I've had to restart the entire ring.)

"Down" varies. In my last run:

  • 4 nodes died completely. (Cassandra wasn't running at all.) Checking the logs, there didn't appear to be anything logged as to why it died.
  • On the fifth, Cassandra was running. nodetool status on that node hangs. Two threads appears to be in infinite loops of some sort. (They're using 100% CPU solidly.) There is a java.lang.OutOfMemoryError: Java heap space in the logs.

The code is essentially:

def prepped_batch_insert(session, items, insert_query, silent=False):

    # A mapping of number of inserts -> a prepared query for that number of
    # inserts.
    prepped_statements = {}

    def get_prepped_statement(inserts):
        if inserts in prepped:
            # We already created a prepared query for this many inserts, use
            # it:
            return prepped_statements[inserts]
        else:
            # We haven't yet created a prepared query for this many inserts, so
            # do so now:
            query = ['BEGIN UNLOGGED BATCH']
            for idx in xrange(inserts):
                query.append(insert_query.query)
            query.append('APPLY BATCH;')
            query = '\n'.join(query)
            ps = session.prepare(query)
            prepped_statements[inserts] = ps
            return ps

    def do_prepped_batch_insert(batch)
        ps = get_prepped_statement(len(batch))

        # Generate the list of params to the prepared query:
        params = []
        for idx, item in enumerate(batch):
            for k in insert_query.keyorder:
                params.append(item[k])
        # Do it.
        session.execute(ps, params)

    return inserter.insert_and_time(
        items,  # data generator
        do_prepped_batch_insert,  # The above function
        _WHAT_APPEARS_TO_BE_THE_OPTIMAL_CASSANDRA_BATCH_SIZE,  # = 200
        silent=silent,
    )

The function insert_and_time splits items up into batches of size 200, calls the above function, and times the whole kit and kaboodle. (This code is toxic to the ring.)

We attempted more reads because (I was told) 20k inserts / second was slow (it will take a while to insert the data I'd like to insert at that rate…), and that Cassandra was capable of high capacity.

My questions:

  1. Is there anything unusual about what I'm doing? Anything wrong?
  2. Am I simply DDoS-ing my ring?
  3. How can I go about debugging what's wrong?
  4. An errant client, IMHO, should never be able to kill the server. (And the above isn't terribly errant.) Anything I can do to prevent this?

¹The client appears to also slowly leak file descriptors. I don't think this is related. (I'm calling .shutdown on both the cluster and the connection.) Looking at the driver source, there appear to be plenty of pathways where an exception would cause a leak.

解决方案

Your situation sounds unusual but without any details about the hardware you are running on here is some speculation I have. The issue is most likely Heap size, followed by IO bottlenecking. Unless you are running on SSD's CPU usage should not be an issue.

1) If you are looking for a one time load of data followed by a smaller consistent stream of data consider using the bulk loading tool.

2) Possibly in the sense that you are attempt to load data a rate faster than the hardware you are using can handle.

3) You should take a look in to the Cassandra system logs for messages like "trying to flush memtable to recover space" that is symptomatic of running out of heap, they will have information about memory GC and other on going tasks. For real-time monitoring you can also connect via JMX using jconsole or visualvm to your Cassandra instances. When looking at these it should be obvious if heap begins to fill up and the system begins to back up. Most production Cassandra instances have a Heap size of 8GB with amounts larger than that giving diminishing returns as stop the world GC events become more prevalent.

The other thing you should be watching is pending compactions, this is one of the key IO bottlenecks of Cassandra. If this number grows without bound it means your system is limited by the hard drive speed and you can ease the stress with more machines, or by upgrading to SSDs. Check this with

 nodetool compactionstats  

A tool you can use to monitor all of this is Datastax Opscenter. This tool will allow you to easily monitor your entire cluster from one place and the community edition is completely free.

I do wonder if something else is amiss though because I regularly benchmark Amazon m1.large instances and find that about 10 of these can support traffic in the 40~50k writes / s without any sort of system instability.

4) As noted by Eric, It is very difficult for a distributed performance oriented system like Cassandra to remain available and performant while also upholding guardrails against client behavior. The tradeoff was increased speed for minimal checking of the system state when writes occur. This allows for extremely fast writes but puts the onus on the maintainer to properly provision and monitor their system.

这篇关于大量写入会导致Cassandra环不稳定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆