Cassandra文档中的语句需要说明 [英] Explanation required for a statement in Cassandra documentation

查看:177
本文介绍了Cassandra文档中的语句需要说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



它声称Insert-heavy工作负载在Cassandra中是受CPU限制的,在成为内存绑定之前 。



有人可以解释这项索赔的处理方式吗?

解决方案



<

对于不同的工作负载,Cassandra群集可以是CPU,内存,I / O或(偶尔)网络绑定。文档中的声明是,如果你开始一个新的集群,并做了大量的插入,集群最初将是CPU绑定,但一段时间后,它成为内存瓶颈。



要处理插入,Cassandra需要对来自客户端的消息进行反序列化,找到哪些节点应存储数据并向这些节点发送消息。这些节点然后将数据存储在称为Memtable的存储器数据结构中。



这几乎总是CPU限制。但是,随着插入更多数据,memtables变大,并刷新到磁盘,并创建新的(空)memtables。刷新的memtables存储在称为SSTables的文件中。有一个称为压缩的持续后台进程,将SSTables合并到逐渐变大的和更大的文件中。



在这个阶段有更多内存可以帮助的几个原因: p>


  • 如果Cassandra的堆空间不足,它会刷新memtables当它们更小。

  • 如果工作负载涉及在不同时间覆盖或插入同一行,那么执行此操作的成本要低得多,如果行仍然在当前memtable。如果不是,覆盖和新列存储在新的memtable中,然后在压缩期间刷新并合并。

  • 您的操作系统使用内存来缓冲压缩期间的读取和写入操作。如果操作系统不能,那么会有额外的I / O,减慢memtable刷新和压缩。

  • 插入Cassandra消耗大量的Java对象,因此为垃圾回收器创建工作。如果堆太小,插入可能会暂停,而GC运行时会创建一些空闲堆。 (另一方面,如果堆太大,插入可能会在停止世界GC期间暂停几秒钟。)



因此插入可能成为内存边界,但它们也可能成为I / O边界。如果没有足够的I / O来刷新memtables,那么当memtable刷新队列已满时,插入将被阻塞。所以我认为声明可能更准确一点:



插入重负载在Cassandra中是CPU限制的,然后成为内存或I / O绑定。


I was going through the DataStax documentation and found an interesting statement.

It claimed "Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound".

Can someone explain about how this claim is made? and what might be causing this behavior of Cassandra??

Thanks.

解决方案

For different workloads, Cassandra clusters can be CPU, memory, I/O or (occasionally) network bound. The claim in the documentation is, if you start a new cluster and make lots of inserts, the cluster will initially be CPU bound but after a while it becomes bottlenecked on memory.

To process an insert, Cassandra needs to deserialize the messages from the clients, find which nodes should store the data and send messages to those nodes. Those nodes then store the data in an in memory data structure called a Memtable.

This is almost always CPU bound initially. However, as more data is inserted, the memtables grow large and are flushed to disk and new (empty) memtables are created. The flushed memtables are stored in files known as SSTables. There is an ongoing background process called compaction that merges SSTables together into progressively larger and larger files.

There are a few reasons why more memory will help at this stage:

  • If Cassandra is low on heap space, it will flush memtables when they are smaller. This creates smaller SSTables so more work to compact them.
  • If the workload involves overwrites or inserts to the same row at different times, it is much cheaper to do this if the row is still in a current memtable. If not, the overwrite and new column is stored in a new memtable, then flushed and merged during compaction. So again, less memory means more compaction work.
  • Your OS uses memory to buffer reads and writes during compaction. If the OS can't then there will be extra I/O, slowing down memtable flushing and compaction.
  • Inserts into Cassandra consume lots of Java objects so create work for the garbage collector. If the heap is too small inserts may be paused while GC runs to make some free heap. (On the other hand, if the heap is too large, inserts may be paused for a few seconds during stop-the-world GC.)

So inserts may become memory bound, but they could also become I/O bound. If there isn't enough I/O to flush memtables then inserts will become blocked once the memtable flush queue is full. So I think the claim could be a bit more accurate:

Insert-heavy workloads are CPU-bound in Cassandra before becoming memory or I/O bound.

这篇关于Cassandra文档中的语句需要说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆