为什么我添加节点时我的cassandra吞吐量没有提高? [英] Why is my cassandra throughput not improving when I add nodes?

查看:146
本文介绍了为什么我添加节点时我的cassandra吞吐量没有提高?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个新手问题。我试图做我的家庭作业,但我被卡住试图学习如何cassandra将如广告一样线性缩放。当我针对单个cassandra节点运行时,我获得合理的插入率。以下是一些相关的信息:

this is a newbie question. I have tried to do my homework, but I am stuck trying to learn how cassandra will scale linearly as advertized. When I run against a single cassandra node, I get reasonable insert rates. Here are some relevant bits of information:


  • CentOS 6.5

  • java 1.7.0_71

  • cassandra 2.1.4二进制下载

  • 不同驱动器上的数据和commitlog

  • compaction_throughput_mb_per_sec:0

  • 10,000,000次插入

  • 插入率:〜110K次插入

  • 尚未实现这些设置,因为我不感兴趣的东西快得像在观察线性缩放。

  • CentOS 6.5
  • java 1.7.0_71
  • cassandra 2.1.4 binary download
  • data and commitlog on different drives
  • compaction_throughput_mb_per_sec: 0
  • 10,000,000 inserts
  • Insertion rate: ~110K inserts/s
  • Have not implemented these settings yet, since I am not interested in making things blazing fast as much as in observing linear scaling.

我的键空间定义如下:

create keyspace nms WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
use nms;
CREATE TABLE RN(tableId int, sampleTime timestamp, sampleValue bigint, sampleStdev bigint, sampleRate bigint, tz_offset int,
       PRIMARY KEY (tableId, sampleTime));

我的相关的java代码看起来像这样(大致):

My relevant java code looks like this (roughly):

cluster = Cluster.builder().addContactPoint("138.42.229.240")
                .withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ANY))
                .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
                .withLoadBalancingPolicy(new TokenAwarePolicy(new RoundRobinPolicy()))
                .build();
session = cluster.connect("nms");
batch = new BatchStatement();
statement = session.prepare("INSERT INTO RN" +
            "(tableId, sampleTime, sampleValue, sampleStdev, sampleRate, tz_offset)" +
            "VALUES (?, ?, ?, ?, ?, ?);");

我插入32个tableIds(分区键),每个由单个线程拥有 sampleTimes。其他数据是填充垃圾。

I am inserting 32 tableIds (partition key), each "owned" by a single thread, and unique sampleTimes. The other data is filler junk.

我发现每个批次和10个executeAsync()调用组的甜蜜点是〜10次插入。

I found the sweet spot to be ~10 inserts per batch and 10 executeAsync() call groups.

到目前为止很好。现在,添加了4个节点,在SSD SAN上运行硬件和3个虚拟机(我不知道)。我为每个节点使用类似的配置,如上所述,并运行我的简单测试期待一些改进。插入率不变。我不能解释这一点。我本来希望一些改进。此外,速率在2,3,4和5个节点处大体上保持不变。我意识到,奇数可能没有意义,但我绝望。

So far so good. Now, added 4 nodes, scrounging hardware and 3 VMs running on an SSD SAN (not ideal, I know). I used similar configuration for each node as what I described above and ran my simple test expecting some improvements. The insertion rate was unchanged. I cannot explain that. I would have expected some improvement. Moreover, the rate remains largely unchanged with 2, 3, 4 and 5 nodes. I realize that odd numbers probably make no sense, but I was desperate.

然后我尝试设置的keyspace与复制因子为零。我的数据速率降至1K插入/秒。我不能解释这个。

I then tried setting up the keyspace with a replication factor of zero. My data rates went down to 1K inserts/s. I cannot explain this. I must be missing something really obvious, but I cannot see it.

推荐答案

也许插入客户端应用程序是最大的,而不是群集?可以尝试使用另一台机器,并运行那个java代码,以及查看吞吐量一半或两个客户端是否相同。

Maybe inserting client app is maxed out, not the cluster? Could try using another machine and running the java code on that one as well and see if the throughput halves or is same for both clients.

这篇关于为什么我添加节点时我的cassandra吞吐量没有提高?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆