Cassandra分区大小和性能? [英] Cassandra partition size and performance?

查看:585
本文介绍了Cassandra分区大小和性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我自己的笔记本电脑(8核心,16GB)上用cassandra压力工具玩,Cassandra 2.2.3开箱即用,具有库存配置。我正在做的正是这里描述的:



http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema <



我的观察是:





文档中说Cassandra可以支持多达20亿行划分。我不需要那么多还是我不知道每个分区只有5000条记录可以减缓写入10次或我缺少一些东西。

解决方案

支持与最佳表现有一点不同。你可以有非常宽的分区,但经验法则是试图将它们保持在100mb以下的其他性能原因。当整个分区可以存储在存储器中时,可以更有效地执行一些操作。



作为一个例子(这是一个老的例子,这是一个完整的非问题post 2.0,一切都是单程),但在某些版本中,一个两遍过程,将压实吞吐量减半。它仍然使用巨大的分区。我见过很多多gb的工作很好。但是具有巨大分区的系统在操作上很难工作(管理压缩/修复/ gcs)。



我会说目标规则为100mb,有找到自己最优。事情总是根据用例而不同,为了最大限度地利用节点,你可以做的最好的一些基准是最接近你要做的事情(所有系统的真实情况)。这似乎是你已经做的事情,你绝对在正确的道路。


I was playing around with cassandra-stress tool on my own laptop (8 cores, 16GB) with Cassandra 2.2.3 installed out of the box with having its stock configuration. I was doing exactly what was described here:

http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

And measuring its insert performance.

My observations were:

  • using the code from https://gist.github.com/tjake/fb166a659e8fe4c8d4a3 without any modifications I had ~7000 inserts/sec.
  • when modifying line 35 in the code above (cluster: fixed(1000)) to "cluster: fixed(100)", i. e. configuring my test data distribution to have 100 clustering keys instead of 1000, the performance was jumping up to ~11000 inserts/sec
  • when configuring it to have 5000 clustering keys per partition, the performance was reducing to just 700 inserts/sec

The documentation says however Cassandra can support up to 2 billion rows per partition. I don't need that much still I don't get how just 5000 records per partition can slow the writes 10 times down or am I missing something?

解决方案

Supporting is a little different from "best performaning". You can have very wide partitions, but the rule-of-thumb is to try to keep them under 100mb for misc performance reasons. Some operations can be performed more efficiently when the entirety of the partition can be stored in memory.

As an example (this is old example, this is a complete non issue post 2.0 where everything is single pass) but in some versions when the size is >64mb compaction has a two pass process, that halves compaction throughput. It still worked with huge partitions. I've seen many multi gb ones that worked just fine. but the systems with huge partitions were difficult to work with operationally (managing compactions/repairs/gcs).

I would say target the rule of thumb initially of 100mb and test from there to find own optimal. Things will always behave differently based on use case, to get the most out of a node the best you can do is some benchmarks closest to what your gonna do (true of all systems). This seems like something your already doing so your definitely on the right path.

这篇关于Cassandra分区大小和性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆